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B.l Introduction 

The objective of these lectures is to provide a practical introduction to strong 
gravitational lensing including the data, the theory, and the application of 
strong lensing to other areas of astrophysics. This is Part 2 of the complete 
Saas Fee lectures on gravitational lensing. Part 1 (Schneider 2004) provides 
a basic introduction, Part 2 (Kochanek 2004) examines strong gravitational 
lenses, Part 3 (Schneider 2004) explores cluster lensing and weak lensing, 
and Part 4 (Wambsganss 2004) examines microlensing. 1 The complete lec- 
tures provide an updated summary of the field from Schneider, Ehlers & 
Falco (|1992|l . There are also many earlier (and shorter!) reviews of strong 
lensing (e.g. Blandford & Kochanek fl987al Blandford & Naravan lTM?! Refs- 
dal & SurdejUHHH Wambsganss ll998l Narayan & Bartelmann ll9H9l Courbin, 
Saha & Schechter l2DU21 Claeskens & SurdejlHD). 

It is not my objective in this lecture to provide a historical review, care- 
fully outlining the genealogy of every development in gravitational lensing, 
but to focus on current research topics. Part 1 of these lectures summarizes 
the history of lensing and introduces most of the basic equations of lensing. 
The present discussion is divided into 9 sections. We start in !)B.2I with an 
introduction to the observational data. In flB.3l we outline the basic principles 
of strong lenses, building on the general theory of lensing from Part 1. In flB.4l 
we discuss modeling gravitational lenses and the determination of the mass 
distribution of lens galaxies. In 3B.5I we discuss time delays and the Hubble 
constant. In 3B.6l we discuss gravitational lens statistics and the cosmological 
model. In ^IB.7I we discuss the differences between galaxies and clusters as 
lenses. In iffi.8l we discuss the effects of substructure or satellites on gravita- 
tional lenses. In ilB.QI we discuss the optical properties of lens galaxies and in 



flB.lOl we discuss extended sources and quasar host galaxies. Finally in gBJT] 
we discuss the future of strong gravitational lensing. 

It will be clear to readers already familiar with the field that these are 
my lectures on strong lensing rather than an attempt to achieve a mythical 

1 For astro-ph users, the lectures should be referenced as: Kochanek, C.S., Schnei- 
der, P., Wambsganss, J., 2004, Gravitational Lensing: Strong, Weak & Micro, 
Proceedings of the 33 rd Saas-Fee Advanced Course, G. Meylan, P. Jetzer & P. 
North, eds. (Springer- Verlag: Berlin). 
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consensus. I have tried to make clear what matters (and what does not), what 
lensing can do (and cannot do) for astrophysics, where the field is serving the 
community well (and poorly), and where non-experts have understood the 
consequences (or have failed to do so). Doing so requires having definite 
opinions with which other researchers may well disagree. We will operate 
on the assumption that anyone who disagrees sufficiently violently will have 
an opportunity to wreak a horrible revenge at a later date by spending six 
months doing their own set of lectures. 
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B.2 An Introduction to the Data 

There are now 82 candidates for multiple image lenses besides those found 
in rich clusters. Of these candidates, there is little doubt about 74 of them. 
The ambiguous candidates consist of faint galaxies with nearby arcs and no 
spectroscopic data. Indeed, the absence of complete spectroscopic informa- 
tion is the bane of most astrophysical applications of lenses. Less than half 
(38) of the good candidates have both source and lens redshifts - 43 have 
lens redshifts, 64 have source redshifts, and 5 have neither redshift. Much 
of this problem could be eliminated in about 5 clear nights of 8m time, 
but no TAC seems willing to devote the effort even though lens redshifts 
probably provide more cosmological information per redshift than any other 
sparsely distributed source. Of these 74 lenses, 10 have had their central ve- 
locity dispersions measured and 10 have measured time delays. A reasonably 
complete summary of the lens data is available at the CASTLES WWW 
site |bttp: / / cfa-www.har vard.edu/castles/ , although lack of manpower means 
that it is updated only episodically. 

Fig. IB. II shows the distribution of the lenses in image separation and 
source redshift. The separations of the images range from 0'.'35 to 15'.'9 (us- 
ing either half the image separations or the mean distance of the images 
from the lens). The observed distribution combines both the true separation 
distribution and selection effects. For example, in simple statistical models 
using standard models for galaxy properties we would expect to find that the 
logarithmic separation distribution dN / d In AO is nearly constant at small 
separations (i.e. dN/dA9 oc A9, ilB.6(l . while the raw, observed distribution 
shows a cutoff due to the finite resolution of lens surveys (typically 0"25 to 
l'.'O depending on the survey). The cutoff at larger separations is real, and it 
is a consequence of the vastly higher lensing efficiency of galaxies relative to 
clusters created by the cooling of the baryons in galaxies (see 9B.7|) . 

Fig. IB. 2| shows the distribution in image separation and lens galaxy red- 
shift. There is no obvious trend in the typical separation with redshift, as 
might be expected if there were rapid evolution in the typical masses of 
galaxies. Unfortunately, there is also an observational bias to measure the 
redshifts of large separation lenses, where the lens galaxies tend to be brighter 
and less confused with the images, which makes quantitative interpretation 
of any trends in separation with redshift difficult. There is probably also a 
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bias against finding large separation, low lens rcdshift systems because the 
flux from the lens galaxy will more easily mask the flux from the source. We 
examine the correlations between image separations and lens magnitudes in 

In almost all cases the lenses have geometries that are "standard" for 
models in which the angular structure of the gravitational potential is domi- 
nated by the quadrupole moments of the density distribution, either because 
the lens is ellipsoidal or because the lens sits in a strong external (tidal) 
shear field. Of the 60 lenses where a compact component (quasar or radio 
core) is clearly identifiable, 36 are doubles, 2 are triples, 20 are quads, 1 has 
five images and 1 has six images. The doubles and quads are the standard 
geometries produced by standard lenses with nearly singular central surface 
densities. Examples of these basic patterns are shown in Figs. IB.3l and lB~4l 

In a two-image lens like HE1104-1805 (Wisotzki et al. I1993[) . the images 
usually lie at markedly different distances from the lens galaxy because the 
source must be offset from the lens center to avoid producing four images. The 
quads show three generic patterns depending on the location of the source 
relative to the lens center and the caustics. There are cruciform quads like 
HE0435-1223 (Wisotzki et al. £S002), where the images form a cross pattern 
bracketing the lens. These are created when the source lies almost directly 
behind the lens. There are fold-dominated quads like PG1115+080 (Wey- 
mann et al. I198UII . where the source is close to a fold caustic and we observe 
a close pair of highly magnified images. Finally, there are cusp-dominated 
quads like RXJ1131-1231 (Sluse et al.|2003), where the source is close to a 
cusp caustic and we observe a close triple of highly magnified images. These 
are all generic patterns expected from caustic theory, as we discuss in Part 1 
and flB.31 We will discuss the relative numbers of doubles and quads in flB.6l 

The lenses with non-standard geometries all have differing origins. One 
triple, APM08279+5255 (Irwin et al- HMSI Ibata et al. lT§9"51 Munoz, Kochanek 
& Keeton 20QTJ, is probably an example of a disk or exposed cusp lens (see 
jrOjl . while the ot her, PMNJ1632-0033 (Winn et al. I2()()2al Winn, Rusin 
& Kochanek 12004}! . appears to be a classical three image lens with the 
third image in the core of the lens (Fig. IB.5|) . The system with five im- 
ages. PMNJ0134 0931 (Winn et al. 2002c, Keeton & Winn 2003, Winn et 
al. 2003), is due to having two lens galaxies, while the system with six images, 
B1359+154 (Myers et al. Dj99S; Rusin et al. I2001J) . is a consequence of hav- 
ing three lens galaxies inside the Einstein ring. Many lenses have luminous 
satellites that are required in any successful lens model, such as the satellites 
known as "Object X" in MG0414+0534 (Hewitt et al. IT§9H Schechter & 
Moore HM^ and object D in MG2016+112 (Lawrence et al. H^l ) shown in 
Fig. IB. 61 These satellite galaxies can be crucial parts of lens models, although 
there has been no systematic study of their properties in the lens sample. 

If the structure of the source is more complicated, then the resulting im- 
age geometries become more complicated. For example, the source of the 
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Fig. B.l. The distribution of lens galaxies in separation AO and source redshift z s . 
The solid histogram shows the distribution in separation for all the lenses while the 
dashed histogram shows the distribution of those with unmeasured source redshifts. 
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Fig. B.2. The distribution of lens galaxies in separation AO and lens redshift zi. 
The solid histogram shows the distribution in separation for all the lenses while the 
dashed histogram shows the distribution of those with unmeasured lens redshifts. 
There are no obvious correlations between lens redshift zi and separation AO, but 
the strong selection bias that small separation lenses are less likely to have measured 
redshifts makes this difficult to interpret. There may also be a deficit of low redshift, 
large separation lenses, which may be a selection bias created by the difficulty of 
finding quasar lenses embedded in bright galaxies. 




Fig. B.3. Standard image geometries. (Top) The two-image lens HE1104-1805. G 
is the lens galaxy and A and B are the quasar images. We also see arc images of 
the quasar host galaxy underneath the quasar images. (Bottom) The four-image 
lens PG1115+080 showing the bright Ai and A2 images created by a fold caustic. 
(Top, next page) The four-image lens RXJ1131-1231 showing the bright A, B and C 
images created by a cusp caustic. (Bottom, next page) The four-image lens HE0435- 
1223, showing the cruciform geometry created by a source near the center of the 
lens. For each lens we took the CASTLES H-band image, subtracted the bright 
quasars and then added them back as Gaussians with roughly the same FWHM as 
the real PSF. This removes the complex diffraction pattern of the HST PSF and 
makes it easier to see low surface brightness features. 
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Fig. B.5. PMN 1632-0033 is the only known lens with a "classical" third image 
in the core of the lens galaxy. The center of the lens galaxy is close to the faint 
C image. Images A, B and C have identical radio spectra except for the longest 
wavelength flux of C, which can be explained by absorption in the core of the lens 
galaxy. Time delay measurements would be required to make the case absolutely 
secure. A central black hole in the lens galaxy might produce an additional image 
with a flux about 10% that of image C. (Winn et al. I2004t 

radio lens B1933+503 (Sykes et a.1. 11998f) consists of a radio core and two ra- 
dio lobes, leading to 10 observable images because the core and one lobe are 
quadruply imaged and the other lobe is doubly imaged (Fig. IB. 7(1 . If instead 
of discrete emission peaks there is a continuous surface brightness distribu- 
tion, then we observe arcs or rings surrounding the lens galaxy. Fig. IB. 81 
shows examples of Einstein rings for the case of MG1131+0456 (Hewitt et 
al. 1988) in both the radio (Chen & Hewitt 1993) and the infrared (Kochanek 
et al. I2UU(J|I . The radio ring is formed from an extended radio jet, while the 
infared ring is formed from the host galaxy of the radio source. We also chose 
most of the examples in Figs . IB . 31 and IB . 4l to show prominent arcs and rings 
formed by lensing the host galaxy of the source quasar. We discuss arcs and 
rings in 101 

B.3 Basic Principles 

Most gravitational lenses have the standard configurations we illustrated in 
flB.21 These configurations are easily understood in terms of the caustic struc- 
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Fig. B.6. H-band images of two lenses with small companions that are crucial for 
successful models. The upper image shows "Object X" in MG0414+0534, and the 
lower image shows component D of MG2016+112. MG2016+112 has the additional 
confusion that only A and B are images of the quasar (Koopmans et al. [2002). 
Image C is some combination of emission from the quasar jet (it is an extended 
X-ray source, Chartas et al. I200H and the quasar host galaxy. Object D is known 
to be at the same redshift as the primary lens galaxy G (Koopmans & Treup!002). 
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Fig. B.7. A Merlin map ol B 1933+503 showing the 10 observed images of the three 
component source (Mar low et al. 119991 . The flat radio spectrum core is lensed into 
images 1, 3, 4 and 6. One radio lobe is lensed into images la and 8, while the other 
is lensed into images 2, 7 and 5. Image 2 is really two images merging on a fold. 



tures generic to simple lens models. In this section we illustrate the origin of 
these basic geometries using simple mathematical examples. We build on the 
general outline of lensing theory from Part 1. 

B.3.1 Some Nomenclature 

Throughout this lecture we use comoving angular diameter distances (also 
known as proper motion distances) rather than the more familiar angular 
diameter distances because almost every equation in gravitational lensing 
becomes simpler. The distance between two redshifts i and j is 



Uij = — - ;o Sinn 



lai 1 / 2 



• [(l + z) 2 (l + f2 M z)-z(2 + z)f2 A } 



1/2 



(B.l) 



where Om, and fl^ = \ — flu ~ ar e the present day matter density, 
cosmological constant and "curvature" density respectively, m = c/ Hq is 
the Hubble radius, and the function sinn(a;) becomes sinh(x), x or sin(x) 
for open (17/. > 0), flat (l?^ = 0) and closed (J?^ < 0) models (Carroll, 
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Fig. B.8. The radio (top) and H-band (bottom) rings in MG11131+0456. The 
radio map was made at 8 GHz by Chen & Hewitt {1993 ), while the H-band image 
is from Kochanek et al. (2000J- The radio source D is probably another example 
of a central odd image, but the evidence is not as firm as that for PMN1632-0033. 
Note the perturbing satellite galaxies (G9 and G15) in the H-band image. 
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Press & Turner 1092). We use Dd, D a and Dd s for the distances from the 
observer to the lens, from the observer to the source and from the lens to the 
source. These distances are trivially related to the angular diameter distances, 
D°" s = D VJ /{1 + Zj ), and luminosity distances, D l £ m = Dy(l + Zj ). In 
a flat universe, one can simply add comoving angular diameter distances 
(D s = Dd + Dds), which is not true of angular diameter distances. The 
comoving volume element is 

dV = 1/2 -> AnDjdDd (B.2) 

for flat universes. We denote angles on the lens plane by 6 = #(cosx,sinx) 
and angles on the source plane by (3. Physical lengths on the lens plane are 
£ = D a d na B. The lensing potential, denoted by <?■(<?), satisfies the Poisson 
equation V 2 ^ = 2/t where k = £ / S c is the surface density £ in units of the 
critical surface density S e = c 2 (l + zi)D s /(4:irGDdD ds ). For a more detailed 
review of the basic physics, see Part 1. 



a{6) = ; I 6d6K{6) = 3 V "^ ^ (B.3) 



B.3. 2 Circular Lenses 

While one of the most important lessons about modeling gravitational lenses 
in the real world is that you can never (EVER!) 2 safely neglect the angular 
structure of the gravitational potential, it is still worth starting with circular 
lens models. They provide a basic introduction to many of the elements which 
are essential to realistic models without the need for numerical calculation. 
In a circular lens, the effective lens potential (Part 1) is a function only of 
the distance from the lens center = \9\. Rays are radially deflected by the 
angle 

2 f 8 aM „,K_*G M{< Q D ds 

c 2 e d s 

where we recall from Part 1 that k(9) = £(ff)/E c is the surface density 
in units of the critical surface density, Dd s and D 8 are the lens-source and 
observer-source comoving distances and £ = D°~ d na Q is the proper distance 
from the lens. The bend angle is simply twice the Schwarzschild radius of the 
enclosed mass, 4GM (< £)/c 2 , divided by the impact parameter £ and scaled 
by the distance ratio D ds /D s . 

The lens equation (see Part 1) becomes 

P = 0[l-a(6)/6] =0[1- (k(0)}] (B.4) 

where 

<«(*)) = i I 0d0 < 9 ) = a ( 9 )/ ( B - 5 ) 



d 2 Jo 



AND I MEAN EVER! DON'T EVEN THINK OF IT! 



B Strong Gravitational Lensing 13 



is the average surface density interior to 8 in units of the critical density. Note 
that there must be a region with (re) > 1 to have solutions on both sides of 
the lens center. Because of the circular symmetry, all images will lie on a line 
passing through the source and the lens center. 

The inverse magnification tensor (or Hessian, see Part 1) also has a simple 
form, with 

w-1 d(3 ,„ , /10\ /cos2y sui2y\ ,_ „. 

dO \ 1 J 1 \ sm2x - cos 2% J v ' 

where 6 = 9(cos %, sin%). The convergence (surface density) is 



1 / a da 
2\~6 + d9 



(B.7) 



and the shear is 



I/a iia\ ,, 



The eigenvectors of M _1 point in the radial and tangential directions, with a 
radial eigenvalue of A+ = 1 — k + 7 = 1 — da/ dO and a tangential eigenvalue of 
A_ = 1 — n— 7= l—a/8= 1 — (k) . If either one of these eigenvalues is zero, the 
magnification diverges and we are on either the radial or tangential critical 
curve. If we can resolve the images, we will see the images radially magnified 
near the radial critical curve and tangentially magnified near the tangential 
critical curve. For example, all the quasar host galaxies seen in Figs. IB. 51 and 
IB.4I lie close to the tangential critical line and are stretched tangentially to 
form partial or complete Einstein rings. The signs of the eigenvalues A± give 
the parities of the images and the type of time delay extremum associated 
with the images. If both eigenvalues are positive, the image is a minimum. If 
both are negative, the image is a maximum. If one is positive and the other 
negative, the image is a saddle point. The inverse of the total magnification 
fi^ 1 = |M _1 | is the product of the eigenvectors, so it is positive for minima 
and maxima and negative for saddle points. The signs of the eigenvalues are 
referred to as the partial parities of the images, while the sign of the total 
magnification is referred to as the total parity. 

It is useful to use simple examples to illustrate the behavior of circular 
lenses for different density profiles. In most previous lensing reviews, the 
examples are based on lenses with finite core radii. However, most currently 
popular models of galaxies and clusters have central density cusps rather 
than core radii, so we will depart from historical practice and focus on the 
power-law lens (e.g. Evans & Wilkinson 1998). Suppose, in three dimensions, 
that the lens has a density distribution p oc r~ n . Such a lens will produce 
deflections of 



a{9) =b[~) (B.9) 
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as shown in Fig. IB. 91 with convergence and shear profiles 

The power law lenses cover most of the simple, physically interesting models. 
The point mass lens is the limit n — > 3, with deflection a = b 2 /9, convergence 
k — (with a central singularity) and shear 7 — b 2 /r 2 . The singular isother- 
mal sphere (SIS) is the case with n = 2. ft has a constant deflection a = b, 
and equal convergence and shear k = 7 = b/26. A uniform critical sheet is 
the limit n — ► 1 with a = 9, K = 1 and 7 = 0. Models with n — > 3/2 have the 
cusp exponent of the Moore (1998) halo model. The popular p oc 1/r NFW 
(Navarro, Frenk & White 119961 see ilB.4.11) density cusps are not quite the 
same as the n — ► 1 case because the projected surface density of a p oc 1/r 
cusp has k oc In rather than a constant. Nonetheless, the behavior of the 
power law models as n — > 1 will be very similar to the NFW model if the lens 
is dominated by the central cusp. The central regions of galaxies probably 
act like cusps with 1 <; n <; 2. 

The tangential magnification eigenvalue of these models is 

A_ = 1- «- 7 = 1- ^ = 1-(k)=1- {9/b) 1 - n (B.ll) 
8 

which is always equal to zero at 9 = b = Be- This circle defines the tangential 
critical curve or Einstein (ring) radius of the lens. We normalized the models 
in this fashion because the Einstein radius is usually the best-determined 
parameter of any lens model, in the sense that all successful models will 
find nearly the same Einstein radius (e.g. Kochanek 1991a. Wambsganss & 
Paczynski 19I2J). The source position corresponding to the tangential critical 
curve is the origin (/3 = 0), and the reason the magnification diverges is that 
a point source at the origin is converted into a ring on the tangential critical 
curve leading to a divergent ratio between the "areas" of the source and the 
image. The other important point to notice is that the mean surface density 
inside the tangential critical radius is (n) = 1 independent of the model. 
This is true of any circular lens. With the addition of angular structure it 
is not strictly true, but it is a very good approximation unless the mass 
distribution is very flattened. The definition of b in terms of the properties of 
the lens galaxy will depend on the particular profile. For example, in a point 
mass lens (n -> 3), b 2 = {^GM/c 2 D a d n9 ){D ds /D s ) where M is the mass, while 
in an SIS lens (n = 2), b = Air(o- v /c) 2 Dd s /D s where a v is the (ID) velocity 
dispersion of the lens. For the other profiles, b can be defined in terms of some 
velocity dispersion or mass estimate for the lens, as we will discuss later in 
flB.4.91 and i lB.61 The radial magnification eigenvalue of these models is 

A+ = 1- ^ + 7 = 1- ^ = 1- (2- n){9/b?-» (B.12) 

which can be zero only if n < 2. If n < 2 the deflection goes to zero at the 
origin and the lens has a radial critical curve at 9 = 6(2 — n) 1 ^ 71 ^ 1 ' < b 
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Fig. B.9. The bending angles of the power law lens models. Profiles more centrally 
concentrated (n > 2) than the SIS (n = 2), have divergent central deflections, while 
profiles more extended (n < 2) than SIS have deflection profiles that become zero 
at the center of the lens. The n — 1 model is not quite an NFW model because the 
surface density is constant rather than logarithmic. 



interior to the tangential critical curve. Models with n > 2 have constant 
(n = 2) or rising deflection profiles as we approach the lens center and have 
negative derivatives da/d6 at all radii. 

A nice property of circular lenses is that they allow simple graphical 
solutions of the lens equation for arbitrary deflection profiles. There are two 
parts to the graphical solution - the first is to determine the radial positions 
Oi of the images given a source position /?, and the second is to determine the 
magnification by comparing the area of the images to the area of the source. 
Recall first, that by symmetry, all the images must lie on a line passing 
through the source and the lens. Let 9 now be a signed radius that is positive 
along this line on one side of the lens and negative on the other. The lens 
equation (Eqn. IB.4fl along the line is simply 

jM\e\) = e-(3 (B.i3) 

where we have rearranged the terms to put the deflection on one side and 
the image and source positions on the other. One side of the equation is the 
bend angle fFig. IB.9jl . while the other side of the equation, 9 — (3, is simply a 



16 C.S. Kochanek 




Fig. B. 10. Graphical solutions for the point mass (n = 3) lens. The top panel 
shows the graphical solution for the radial positions of the images, and the bottom 
panel shows the graphical solution for the image structure. Note the strong radial 
demagnification of image 2 produced by the falling deflection profile. 
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Fig. B.ll. Graphical solutions for the SIS (n = 2) lens when (3 < b and there 
two images. 
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Fig. B. 12. Graphical solutions for the SIS (n 
only one image. 



= 2) lens when f3 > b and there is 
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Fig. B. 13. Graphical solutions for the Moore profile cusp (n = 3/2) lens when 
(3 > 6/4 and there is only one image. 
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Fig. B. 14. Graphical solutions for the Moore profile cusp (n = 3/2) lens when 
(3 < 6/4 and there are three images. At the top of the lower panel we illustrate 
the geometric meaning of the image partial parities defined by the signs of the 
magnification tensor eigenvalues (see text). 
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line of unit slope passing through the source position (3. The solutions to the 
lens equation for any source position [3 are the radii 9i where the line crosses 
the curve. 

For understanding any observed lens, it is always useful to first sketch 
where the critical lines must lie. Recall from the discussion of caustics in 
Part 1, that images are always created and destroyed on critical lines as the 
source crosses a caustic, so the critical lines and caustics define the general 
structure of the lens. All our power law models have a tangential critical 
line at 6 = b, which is the solution a(b) — b and corresponds to the source 
position (3 — 0. The origin, as the projection of the critical curve onto the 
source plane, is the tangential caustic (strictly speaking a degenerate pseudo- 
caustic) corresponding to the critical line. A point source at the origin is 
transformed into an Einstein ring of radius Be = b. 

The second step of the graphical construction is to determine the angular 
structure of the image. For simplicity, suppose the source is an arc with radial 
width A/3 and angular width A\- By symmetry, the angle subtended by an 
image relative to the lens center must be the same as that subtended by the 
source. For an image at 9i and a source at (3, the tangential extent of the 
image is |6>;|Z\x while that of the source is (3A\- The tangential magnification 
of the image is simply \0i\/f3 = (1 — \a(0i) / O^)" 1 after making use of the lens 
equation (Eqn. IB. 13(1 . and this is identical to the tangential magnification 
eigenvalue (Eqn. IB. Ill) . The thickness of the arc requires finding the image 
radii for the inner and outer edges of the source, 0i{(3) and 9i{(3 + A(3). The 
ratio of the thickness of the two arcs is the radial magnification, 



and this is simply the inverse of the radial eigenvalue of the magnification 
matrix (Eqn. IB . 12ft where we have taken the derivative of the lens equation 
(Eqn. IB. 13(1 with respect to the source position to obtain the final result. 
Thus, the tangential magnification simply reflects the fact that the angle 
subtended by the source is the angle subtended by the image, while the radial 
magnification depends on the slope of the deflection profile with declining 
deflection profiles {da/d6 < 0) demagnifying the source and rising profiles 
magnifying the source. 

In Fig. IB. 1(11 we illustrate this for the point mass lens (n — > 3). From the 
shape of the deflection profile, it is immediately obvious that there will be 
only two images, one on each side of the lens. If we assume (3 > 0, the first 
image is a minimum located at 




l 



(B.14) 




(B.15) 
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with 9 1 > Be and positive magnification 



i I + Vg+jg + 2 >0 , (B . 16) 

while the second image is a saddle point located at 

i(/3- V/3 2 +46 2 ) (B.17) 
with — #e < #2 < and negative magnification 
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M2 = -- ^=L== + ^- 2 <0. (B.18) 



As the source approaches the tangential caustic (/3 — ► 0) the magnifications of 
both images diverge as and the image radii converge to 0e- As the source 
moves to infinity, the magnification of the first image approaches unity and its 
position approaches that of the source, while the second image is demagnified 
by the factor (l/2)(6//3) and converges to the position of the lens. The image 
separation 

AB= \9 1 -0 2 \ = 2b^l + f3 2 /4b 2 > 2b (B.19) 

is always larger than the diameter of the Einstein ring and the total magni- 
fication 

M+ ^-WFTm il (B - 20) 

is the characteristic light curve expected for isolated Galactic microlensing 
events (see Part 4). The point mass lens has one peculiarity that makes it 
different from extended density distributions like galaxies in that it has two 
images independent of the impact parameter of the source and no radial 
caustic. This is a characteristic of any density distribution with a divergent 
central deflection (n > 2). 

The SIS (n = 2) model is the "standard" lens model for galaxies. Figs. IB. ill 
and IB.12l show the geometric constructions for the images of an SIS lens. If 
< j3 < b, then the SIS lens also produces two images (Fig. IB.lT|l . The first 
image is a minimum located at 

6\ = f3 + b with 9i > b and positive magnification fix = 1 + b//3 (B.21) 

and the second image is a saddle point located at 

02 — (3 — b with — b < 82 < and negative magnification \ii = 1 — b/(3. 

(B.22) 

The image separation \9\ — 02 1 = 2& is constant, and the total magnification 
+ I A*2 1 = 26//3 is a simple power law. The magnification produced by 
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an SIS lens is purely tangential since the radial magnification is unity. If, 
however, [3 > 6, then there is only one image, corresponding to the minimum 
located on the same side of the lens as the source (see Fig. IB.12|) . This 
boundary on the source plane at [3 = 6 between having two images at smaller 
radii and only one image at larger radii is a radial (pseudo)-caustic that can be 
thought of as being associated with a radial critical curve at the origin. It is a 
pseudo-caustic because there are neither images nor a divergent magnification 
associated with it. 

Historically the next step is to introduce a core radius to have a model with 
a true radial critical line and caustic (see Part 1, Blandford & Kochanekjl987b 
Kochanek & Blandford lTWl Kovner lTWal Hinshaw & Krauss lTWl Krauss 
& White ITM21 Wallington & Naravan Kochanek HMHajl . Instead we 

will consider the still softer power law model with n = 3/2, which would 
correspond to the central exponent of the "Moore" profile proposed for CDM 
halos (Moore et al. I1998[) . As Fig. IB. 13l shows, there is only one solution for 
|/3| > 6/4, a minimum located at 

6i = i (b + 2/3 + y/b + 4/?) (B.23) 

and with Q\ > b assuming j3 is positive. The magnification expressions are too 
complex to be of much use, but the magnification fi\ diverges at = b when 
the source is on the tangential pseudo-caustic at j3 = 0. As Fig. IB. ill shows, 
we find two additional images once \/3\ < 6/4. The first additional image is a 
saddle point located at 

02 = \ (-6 + 2/3 - y/b + 4/?) (B.24) 

with — 6 < 6*2 < —6/4, which has a negative magnification that diverges at 
both 6*2 = —b (the tangential critical curve) and 62 = —6/4. This latter radius 
defines the radial critical curve where the magnification diverges because the 
radial magnification eigenvalue 1 — k + 7 = 1 — da/d9 = at radius 8 = 6/4. 
The third image is a maximum located at 

03 = \ (-6 + 2/3 + y/b + 4/3) (B.25) 

with — 6/4<#2<0 and a positive magnification that diverges on the radial 
critical curve. As we move the source outward from the center we would see 
images 2 and 3 approach each other, merging on the radial critical line where 
they would have divergent magnifications, and then vanishing to leave only 
image 1. We would see the same pattern if instead of softening the exponent 
we had followed the traditional path and added a core radius to the SIS 
model. With a finite core radius the central deflection profile would pass 
through zero, and this would introduce a radial critical curve and a third 
image which would be a maximum of the time delay surface. 
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In Fig. lETTl we also illustrate the geometric meaning of the partial par- 
ities (the signs of the magnification eigenvalues). A source structure (the L) 
defines the reference shape. Image 1 is a minimum with positive partial par- 
ities (++) defined by the signs of the tangential and radial eigenvalues. The 
orientation of image 1 is the same as the source. Image 2 is a saddle point 
with mixed partial parities ( — h) because the tangential eigenvalue is negative 
while the radial eigenvalue is positive. This means that the image is inverted 
in the tangential direction relative to the source. Image 3 is a maximum with 

negative partial parties ( ), so the image is inverted in both the radial and 

tangential directions relative to the source. The total parity, the product of 
the partial parities, is positive for maxima and minima so the orientation of 
the image can be produced by rotating the source. The total parity of the 
saddle point image is negative, so its orientation cannot be produced by a 
rotation of the source. 

B.3.3 Non-Circular Lenses 

The tangential pseudo-caustic at the origin producing Einstein ring images is 
unstable to the introduction of any angular structure into the gravitational 
potential of the lens. There are two generic sources of angular perturbations. 
The first source of angular perturbations is the ellipticity of the lens galaxy. 
What counts here is the ellipticity of the gravitational potential rather than 
of the surface density. For a lens with axis ratio q, ellipticity e = 1 — q, or 
eccentricity e = (1 — q 2 ) 1 ^ 2 , the ellipticity of the potential is usually tq, ~ 
e/3 - potentials are always rounder than densities. The second source of 
angular perturbations is tidal perturbations from any nearby objects. This is 
frequently called the "external shear" or the "tidal shear" because it can be 
modeled as a linear shearing of the deflections. In all known lenses, quadrupolc 
perturbations (i.e. oc cos(2x) where \ is the azimuthal angle) dominate - 
higher order multipoles are certainly present and they can be quantitatively 
important, but they are smaller. For example, in an ellipsoid the amplitude 
of the cos 2mx multipole scales as (see 9B.4.4I and 3B.8I) . 

Unfortunately, there is no example of a non-circular lens that can be solved 
in full generality unless you view the nominally analytic solutions to quartic 
equations as helpful. We can make the greatest progress for the case of an 
SIS in an external (tidal) shear field. Tidal shear is due to perturbations from 
nearby objects and its amplitude can be determined by Taylor expanding its 
potential near the lens (see Part 1 and 9B.4JI . Consider a lens with Einstein 
radius 9e perturbed by an object with effective lens potential \P a distance 9 p 
away. For 9e <C p we can Taylor expand the potential of the nearby object 
about the center of the primary lens, dropping the leading two terms. 3 This 

3 The first term, a constant, gives an equal contribution to the time delays of all 
the images, so it is unobservable when all we can measure is relative delays. 
The second term is a constant deflection, which is unobservable when all we can 
measure is relative deflections. 
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leaves, as the first term with observable consequences, 

V{6) =s \d ■ • 6 = X -K v d 2 - \p p 6 2 cos 2( X - X P ) (B.26) 

where k p is the surface density of the perturber at the center of the lens 
galaxy and 7 P > is the tidal shear from the perturber. If the perturber is 
an SIS with critical radius b p and distance 6 P from the primary lens, then 
K p = 7p = bp/29 p . With this normalization, the angle Xp points toward the 
perturber. For a circular lens, the shear j p — (k) — k can be expressed in 
terms of the surface density of the perturber, and it is larger (smaller) than 
the convergence if the density profile is steeper (shallower) than isothermal. 

The effects of observable only if we measure a time delay or have an 

independent estimate of the mass of the lens galaxy, while the effects of the 
shear are easily detected from the relative positions of the lensed images (see 
Part 1). Consider, for example, one component of the lens equation including 
an extra convergence, 

Pi = 0i (1 - K P ) - dV/dOi (B.27) 

and then simply divide by 1 — k p to get 

Pi/(1 - Kp ) =6i- (d&/d6i)/(l - k p ). (B.28) 

The rescaling of the source position 0i/(l — K p ) has no consequences since 
the source position is not an observable quantity, while the rescaling of the 
deflection is simply a change in the mass of the lens. This is known as the 
"mass sheet degeneracy" because it corresponds to adding a constant sur- 
face density sheet to the lens model (Falco, Gorenstein & Shapiro 19821 see 
Part 1), and it is an important systematic problem for both strong lenses and 
cluster lenses (see Part 3). 

Thus, while the extra convergence can be important for the quantitative 
understanding of time delays or lens galaxy masses, it is only the shear that 
introduces qualitatively new behavior to the lens equations. The effective 
potential of an SIS lens in an external shear is \P = bO + (7/2)(9 2 cos 2x 
leading to the lens equations 

ft=0i(l-7)-Wi/|0| fR o ql 

(3 2 = e 2 (i + 1 )-be 2 /\e\ [a - ZJ) 

where for 7 > the perturber is due North (or South) of the lens. The inverse 
magnification is 

M - 1 = l-7 2 -^(l- 7 cos2x) (B.30) 

where 6 — (61,62) = #(cosx,sinx)- 

The first step in any general analysis of a new lens potential is to locate 
the critical lines and caustics. In this case we can easily solve /z -1 = to find 
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that the tangential critical line 



6i _7CO|2x 3 

1 — r y z 



is an ellipse whose axis ratio is determined by the amplitude of the shear 7 
and whose major axis points toward the perturber. We call it the tangential 
critical line because the associated magnifications are nearly tangential to the 
direction to the lens galaxy and because it is a perturbation to the Einstein 
ring of a circular lens. The tangential caustic, the image of the critical line on 
the source plane, is a curve called an astroid (Fig lB.151 it is not a "diamond" 
despite repeated use of the term in the literature) . The parametric expression 
for the astroid curve is 

Pi = - TT- cos X = ~P+ cos 6 X 02 = +z — - sin d X = P- sin d X 
1 + 7 1 - 7 

(B.32) 

where the parameter x is the same as the angle appearing in the critical curve 
fEan. lB~3l)l and we have defined 0± = 267/(1 ± 7) for the locations of the 
cusp tips on the axes. The astroid consists of 4 cusp caustics on the symmetry 
axes of the lens connected by fold caustics with a major axis pointing toward 
the perturber. Like the SIS model without any shear, the origin plays the 
role of the radial critical line and there is a circular radial pseudo-caustic at 
/3 = b. 

As mentioned earlier, there is no useful general solution for the image 
positions and magnifications. We can, however, solve the equations for a 
source on one of the symmetry axes of the lens. For example, consider a 
solution on the minor axis of the lens (02 = for 7 > 0) . There are two ways 
of solving the lens equation to satisfy the criterion. One is to put the images 
on the same axis (9 2 = 0) and the other is to place them on the arc defined 
by = 1 + 7 — b/9. The images with 82 = are simply the SIS solutions 
corrected for the effects of the shear. Image 1 is defined by 

61 = ~ ~ Wlth il = (1 ~ 7 ) ~bTK { ] 

and image 2 is defined by 

* = f£^ with (B.34) 

Image 1 exists if [3\ > —6, it is a saddle point for —&</?!< — f3 + and it 
is a minimum for Pi > — /?+. Image 2 has the reverse ordering. It exists for 
Pi < b, it is a saddle point for P + < Pi < b and it is a minimum for Pi < 0+. 
The magnifications of both images diverge when they are on the tangential 
critical line (Pi = ~P+ for image 1 and Pi = +P+ for image 2) and approach 
zero as they move into the core of the lens (Pi — > —b for image 1 and Pi — > +b 
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for image 2). These two images shift roles as the source moves through the 
origin. The other two solutions are both saddle points, and they exist only if 
the source lies inside the astroid (\/3i \ < fa- along the axis). The positions of 
images 3 (+) and 4 (— ) are 



The magnifications of the images diverge when the source reaches the cusp 
tip (\pi\ = fa-) and the image lies on the tangential critical curve. 

Thus, if we start with a source at the origin we can follow the changes 
in the image structure (see Fig. IB.151 IB. 161) . With the source at the origin 
we see 4 images on the symmetry axes with reasonably high magnifications, 

l/^l = (2/t)/(1 — 7 2 ) ~ 10. It is a generic result that the least magnified 
four-image system is found for an on-axis source, and this configuration has 
a total magnification of order the inverse of the ellipticity of the gravita- 
tional potential. As we move the source toward the tip of the cusp (f3 — > fa-, 
Fig. IB.15|l . image 1 simply moves out along the symmetry axis with slowly 
dropping magnification, while images 2, 3 and 4 move toward a merger on the 
tangential critical curve at 8 = (— fa.,0). Their magnifications steadily rise 
and then diverge when the source reaches the cusp. If we move the source 
further outward we find only images 1 and 2 with 1 moving outward and 
2 moving inward toward the origin. As it approaches the origin, image 2 
becomes demagnified and vanishes when j3 — ► b. Had we done the same cal- 
culation on the major axis (Fig. IB.lBjl . there is a qualitative difference. As 
we moved image 1 outward along the fa axis, image 3 and 4 would merge 
with image 1 when the source reaches the tip of the cusp at fa — fa rather 
than with image 2. 

Unfortunately once we move the source off a symmetry axis, there is no 
simple solution. It is possible to find the locations of the remaining images 
given that two images have merged on the critical line, and this is useful for 
determining the mean magnifications of the lensed images, a point we will 
return to when we discuss lens statistics in £|B.6I Here we simply illustrate 
(Fig. IB.17|) the behavior of the images when we move the source radially 
outward from the origin away from the symmetry axes. Rather than three 
images merging on the tangential critical line as the source approaches the 
tip of a cusp, we see two images merging as the source approaches the fold 
caustic of the astroid. This difference, two images merging versus three images 
merging, is a generic difference between folds and cusps as discussed in Part 1. 
All images in these four-image configurations are restricted to an annulus of 








(B.35) 



and they have equal magnifications 




(B.36) 
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width ~ 76 around the critical line, so the mean magnification of all four 
image configurations is also of order 7 -1 (see Finch et al. [2002). 

There is one more possibility for the caustic structure of the lens if the ex- 
ternal shear is large enough. For 1/3 < I7I < 1, the tip of the astroid caustic 
extends outside the radial caustic, as shown if Fig. IB.18l This allows a new im- 
age geometry, known as the cusp or disk geometry, where we see three images 
straddling the major axis of a very flattened potential. It is associated with 
the caustic region inside the astroid caustic but outside the radial caustic. 
This configuration appears to be rare for lenses produced by galaxies, with 
APM08279+5255 as the only likely candidate, but relatively more common 
in clusters. The difference is that clusters tend to have shallower density pro- 
files than galaxies, which shrinks the radial caustics relative to the tangential 
caustics to allow more cross section for this image configuration and lower 
cllipticity thresholds before it becomes possible (Oguri & Keeton 2004 most 
recently, but also see Kochanek & Blandford 1987, Kovncr 1987a, Wallington 
& Narayan[THni|. 

In general, it is far more difficult to analyze ellipsoidal lenses, in part 
because few ellipsoidal lenses have analytic expressions for their deflections. 
The exception is the isothermal ellipsoid (Kassiola & Kovner^93 , Kormann, 
Schneider & Bartelmann 119941 Keeton & Kochanek 1998), including a core 
radius s, which is both analytically tractable and generally viewed as the 
most likely average mass distribution for gravitational lenses. The surface 
density of the isothermal ellipsoid 



lb 

2uj 



where 



(B.37) 



depends on the axis ratio q and the core radius s. For q = 1 — e < 1 the 
major axis is the 9\ axis and s is the major axis core radius. The deflections 
produced by this lens are remarkably simple, 
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The effective lens potential is cumbersome but analytic, 

!P = 6 ■ a - bs In [(w + s) 2 + (1 - q 2 )6 2 ] 1/2 , 
the magnification is simple 

b b 2 s 
w ~ u[{w + s) 2 + (l-q 2 )0 2 ] 



ui + q 2 s 
(B.38) 

(B.39) 



(B.40) 



and becomes even simpler in the limit of a singular isothermal ellipsoid (SIE) 
with s = where /i -1 — > l — b/u>. In this case, contours of surface density k are 
also contours of the magnification, and the tangential critical line is the k = 



B 
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angential 
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Fig. B.15. Example of a minor axis cusp on the source (top) and image (bottom) 
planes. When the source is inside both the radial and tangential caustics (triangles) 
there are four images. As the source moves toward the cusp, three of the images 
head towards a merger on the critical line and become highly magnified to leave only 
one image once the source crosses the cusp and lies between the two caustics (open 
squares). In a minor axis cusp, the image surviving the cusp merger is a saddle 
point interior to the critical line. As the source approaches the radial caustic, one 
image approaches the center of the lens and then vanishes as the it crosses the 
caustic to leave only one image (pentagons). 
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Fig. B.16. Example of a major axis cusp on the source (top) and image (bottom) 
planes. When the source is inside both the radial and tangential caustics (triangles) 
there are four images. As the source moves toward the cusp, three of the images 
head towards a merger on the critical line and become highly magnified to leave 
only one image once the source crosses the cusp and lies between the two caustics 
(open squares). In a major axis cusp, the image surviving the cusp merger is the 
minimum corresponding to the image we would see in the absence of a lens. As 
the source approaches the radial caustic, one image approaches the center of the 
lens and then vanishes as the source crosses the caustic to leave only one image 
(pentagons). 
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Fig. B. 17. Example of a fold merger on the source (top) and image (bottom) 
planes. When the source is inside both the radial and tangential caustics (filled 
squares) there are four images. As the source crosses the tangential caustic, two 
images merge, become highly magnified and then vanish, leaving only two images 
(triangles) when the source is outside the tangential caustic but inside the radial 
caustic. As the source approaches the radial caustic, one image moves into the 
center of the lens and then vanishes when the source crosses the radial caustic to 
leave only one image when the source is outside both caustics (open squares). 
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Fig. B. 18. Example of a cusp or disk image geometry on the source (top) and 
image (bottom) planes. The shear is high enough to make the tangential caustic 
extend outside the radial caustic. For a source inside both caustics (triangles) we 
see a standard four-image geometry as in Fig. IB. 161 However, for a source outside 
the radial caustic but inside the tangential caustic (squares) we have three images 
all on one side of the lens. This is known as the cusp geometry because it is always 
associated with cusps, and the disk geometry because flattened disks are the only 
natural way to produce them. Once the source is outside the cusp tip (pentagon), 
a single image remains. 
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1/2 isodensity contour just as for the SIS model. The critical radius scale b can 
be related to the circular velocity in the plane of the galaxy relatively easily. 
For an isothermal sphere we have that bsis = ^(o'v/c) 2 D ( is/D s where the 
circular velocity is v c = s/2a v . For the projection of a three-dimensional (3D) 
oblate ellipsoid of axis ratio 53 and inclination i, so that q 2 — q\ cos 2 z + sin 2 i, 
the deflection scale is b = bsisfa/ sin -1 63) where = \J\ — q\ is the 
eccentricity of 3D mass distribution. In the limit that 93 — > the model 
becomes a Mestel fTSSS} disk, the infinitely thin disk producing a flat rotation 
curve, and b = 2bsis/^ (see 913.4.91 and Keeton, Kochanek & Seljak[l997 
Keeton & Kochanek ll9981 Chae l200'3*jl . At least for the case of a face-on disk, 
at fixed circular velocity you get a smaller Einstein radius as you make the 
3D distribution flatter because a thin disk requires less mass to produce the 
same circular velocity. 

We can generate several other useful models from the isothermal ellip- 
soids. For example, steeper ellipsoidal density distributions can be derived 
by differentiating with respect to s 2 . The most useful of these is the first 
derivative with n cx w~ 3 / 2 which is related to the Kuzmin (^956) disk (see 
Kassiola & Kovner fl993l Keeton & Kochanek; 19983. It is also easy to gener- 
ate models with flat inner rotation curves and truncated halos by taking the 
difference of two isothermal ellipsoids. In particular if n(s) is an isothermal 
ellipsoid with core radius s, the model 

k = k(s) - k(o) (B.41) 

with a > s has a central core region with a rising rotation curve for 9 <, 
s, a flat rotation curve for s <, 9 <, a and a dropping rotation curve for 
9 J> a. In the singular limit (s — > 0), it becomes the "pseudo-Jaffe model" 
corresponding to a 3D density distribution p cx (r 2 + s 2 ) _1 (r 2 + a 2 ) -1 whose 
name derives from the fact that it is very similar the Jaffe model with p cx 
r- 2 (r + a)- 2 (Kneib et al. 113951 Kee ton & Kochanek IT3$g|l . We will discuss 
other common lens models in 9B.4.1I 

The last simple analytic models we mention are the generalized singular 
isothermal potentials of the form !^ = 9F(x) with surface density k(9, x) = 
(l/2)(F(x) + F"(x))/6>. Both the SIS and SIE are examples of this model. 
The generalized isothermal sphere has a number of useful analytic properties. 
For example, the magnification contours are isodensity contours 

M" 1 = 1 - I [F(x) + F"{ X )\ = 1 - 2k(0, x) (B.42) 

with the tangential critical line being the contour with n = 1/2, and the 
time delays between images depend only on the distances from the images 
to the lens center (see Witt, Mao & Keeton £2000;. Kochanek, Keeton & 
McLeodUm WucknitzlHl Evans & Witt |2HU3|) . 
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B.4 The Mass Distributions of Galaxies 

Contrary to popular belief, the modeling of gravitational lenses to deter- 
mine the mass distribution of a lens is not a "black art." It is, however, an 
area in which the lensing community has communicated results badly. There 
are two main problems. First, many modeling results seem almost deliber- 
ately obfuscatory as to what models were actually used, what data were fit 
and what was actually constrained. Not only do many lens papers insist on 
taking well known density distributions from the dynamical literature and 
assigning them new names simply because they have been projected into two 
dimensions, but they then assign them a plethora of bizarre acronyms. Some- 
times the model used is not actually the one named, for example using tidally 
truncated halos but calling them isothermal models. Second, there is a steady 
confusion between the parameters of models and the aspects of the mass dis- 
tribution that have actually been constrained. Models with apparently very 
different parameters may be in perfect accord as to the properties of the 
mass distribution that are actually relevant to what is observed. Discussions 
of non-parametric mass models then confuse the issue further by conflating 
differences in parameters with differences in what is actually constrained to 
argue for non-parametric models when in fact they also are simply matching 
the same basic properties with lots of extra noise from the additional and 
uninteresting degrees of freedom. In short, the problem with lens modeling is 
not that it is a "black art," but that the practitioners try to make it seem to 
be a "black art" (presumably so that people will believe they need wizards). 
The most important point to take from this section is that any idiot can 
model a lens and interpret it properly with a little thinking about what it is 
that lenses constrain. 

There are two issues to think about in estimating the mass distributions 
of gravitational lenses. The first issue is how to model the mass distribution 
with a basic choice between parametric and non-parametric models. In i lB.4.11 
we summarize the most commonly used radial mass distributions for lens 
models. Ellipsoidal versions of these profiles combined with an external (tidal) 
shear are usually used to describe the angular structure, but there has been 
recent interest in deviations from ellipsoidal distributions which we discuss 
in i iB.4.41 and In flB.4.71 we summarize the most common approaches 

for non-parametric models of the mass distribution. Since this is my review, 
I will argue that the parametric models are all that is needed to model lenses 
and that they provide a better basis for understanding the results than non- 
parametric models (but the reader should be warned that if Prasenjit Saha 
was writing this you would probably get a different opinion). 

The second issue is to determine the aspects of the lens data that actually 
constrain the mass distribution. Among the things that can be measured for a 
lens are the relative positions of the components (the astrometric constraints) , 
the relative fluxes of the images, the time delays between the images, the dy- 
namical properties of the lens galaxy, and the microlensing of the images. Of 
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these, the most important constraints are the positions. We can usually mea- 
sure the relative positions of the lensed components very accurately (5 mas 
or better) compared to the arc second scales of the component separations. 
Obviously the accuracy diminishes when components are faint, and the usual 
worst case is having very bright lensed quasars that make it difficult to detect 
the lens galaxy. As we discuss in 3B.8I substructure and/or satellites of the 
lens galaxy set a lower limit of order 1-5 mas with which it is safe to impose 
astrometric constraints independent of the measurement accuracy. When the 
source is extended, the resulting arcs and rings discussed in flB.101 provide 
additional constraints. These are essentially astrometric in nature, but are 
considerably more difficult to use than multiply imaged point sources. Our 
general discussion of how lenses constrain the radial f £IB.4.3l) and angular 
structure ( ^13.4.4(1 focus on the use of astrometric constraints, and in ilB.4.61 
we discuss the practical details of fitting image positions in some detail. 

The flux ratios of the images are one of the most easily measured con- 
straints, but are cannot be imposed stringently enough to constrain radial 
density profiles because of systematic uncertainties. Flux ratios measured at a 
single epoch are affected by time variability in the source f ^B.511 . microlens- 
ing by the stars in the lens galaxy in the optical continuum (see Part 4), 
magnification perturbations from substructure at all wavelengths (see 9B.8|) , 
absorption by the ISM of the lens (dust in the optical, free-free in the radio) 
and scatter broadening in the radio (see ^B. 81 and ijB.9|l . Most applications of 
flux ratios have focused on using them to probe these perturbing effects rather 
than for studying the mean mass distribution of the lens. Where radio sources 
have small scale VLBI structures, the changes in the relative astrometry of 
the components can constrain the components of the relative magnification 
tensors without needing to use any flux information (e.g. Garrett et al.[1994 
Rusin et al. 12002)) . 

Two types of measurements, time delays f! jB.5ll and microlensing by the 
stars or other compact objects in the lens galaxy (Part 4) constrain the surface 
density near the lensed images. Microlensing also constrains the fraction of 
that surface density that can be in the form of stars. To date, time delays have 
primarily been used to estimate the Hubble constant rather than the surface 
density, but if we view the Hubble constant as a known quantity, consider 
only time delay ratios, or simply want to compare surface densities between 
lenses, then time delays can be used to constrain the mass distribution. We 
discuss time delays separately because of their close association with attempts 
to measure the Hubble constant. Using microlensing variability to constrain 
the mass distribution is presently more theory than practice due to a lack 
of microlensing light curves for almost all lenses. However, the light curves 
of the one well monitored lens, Q2337+0305, appear to require a surface 
density composed mainly of stars as we would expect for a lens where we see 
the images deep in the bulge of a nearby spiral galaxy (Kochanek 2004). We 
will not discuss this approach further in Part 2. 
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Any independent measurement of the mass of a component will also help 
to constrain the structure of the lenses. At present this primarily means 
making stellar dynamical measurements of the lens galaxy and comparing 
the dynamical mass estimates to those from the lens geometry. We discuss 
this in detail in 3B.4.9I For lenses associated with clusters, X-ray, weak lens- 
ing or cluster velocity dispersion measurements can provide estimates of the 
cluster mass. While this has been done in a few systems (e.g. X-rays, Morgan 
et al. IMH1 Chartas et al. weak lensing, Fischer et al. 115571 velocity 

dispersions, Angonin-Willaime, Soucail & Vanderriest 1 1 994|l . the precision of 
these mass estimates is not high enough to give strong constraints on lens 
models. X-ray observations are probably more important for locating the po- 
sitions of groups and clusters relative to the lens than for estimating their 
masses. 

The most useful way of thinking about lensing constraints on mass dis- 
tributions is in terms of multipole expansions (e.g. Kochanek lltffllal Trotter, 
Winn & Hewitt ESSE Evans & WittESH Kochanek & Dalai l2TM| . An ar- 
bitrary surface density n(6) can be decomposed into multipole components, 

oo 

k(6) = K (8) + ^2 [ K cm(Q) cos(mx) + K sm (8) sin(mx)] (B.43) 

m— 1 

where the individual components are angular averages over the surface den- 
sity 

«o(0) = 77- / dxn(9), and [ cm f . ) = - / d\ \ ^os{mx) ) . 

(B.44) 

The first three terms are the monopole (/-to), the dipole (m = 1) and the 
quadrupole (m = 2) of the lens. The Poisson equation V 2 !? - = 2k is separable 
in polar coordinates, so a multipole decomposition of the effective potential 

00 

W{6) = V o (0) + M cos(mx) + W m {9) sin(m X )] (B.45) 

m=l 

will have terms that depend only on the corresponding multipole of the sur- 
face density, V 2| f r cm (0) cos(mx) = 2n cm {9) cos(mx). The monopole of the 
potential is simply 

l>8 poo 

%(9) = 21og(0) / uduK (u) + 2 udulog(u)K(u) (B.46) 
Jo Je 

and its derivative is the bend angle for a circular lens, 

dfy 2 

= ^- = 7> udu Ko (u), (B.47) 
d9 9 J 
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just as we derived earlier (Eqn.^OJ). The higher order multipoles are no more 
complicated, with 

= _J_ [ 8 u l + ™ du (Kcm(u)\ _6™_ r u l-m d JKcm(uy 
l (0)J m6 m J \K srn (u)J m J g \K sm (u) 

(B.48) 

The angular multipoles are always composed of two parts. There is an in- 
terior pole & C m,int(@) due to the multipole surface density interior to 9 (the 
integral from < u < 9) and an exterior pole x I / cm,ext{9) due to the multipole 
surface density exterior to 9 (the integral from 9 < u < oo). The higher order 
multipoles produce deflections in both the radial 

a cm ,rad = 377 [I'cm cos(mx)] = ^jtr cos(mx) (B.49) 
a9 av 

and tangential 

1 d TTl 

a cm ,tan = 7,-r [^cmcos(mx)] = --r-^ cm sin(mx) (B.50) 
9 ax 9 

directions, where the radial deflection depends on the derivative of \P cm and 
the tangential deflection depends only on ^ cm . This may seem rather formal, 
but the multipole expansion provides the basis for understanding which as- 
pects of mass distributions will matter for lens models. Obviously it is the 
lowest order angular multipoles which are most important. The most common 
angular term added to lens models is the external shear 

*2,ext = \lc9 2 cos 2( X - X 7 ) + \ls0 2 sin 2( X - Xj) (B.51) 

with dimensionless amplitudes j c and 7 S and axis x 7 . The external (tidal) 
shear and any accompanying mean convergence are the lowest order per- 
turbations from any object near the lens that have measurable effects on a 
gravitational lens (see Eqn. IB.26J1 . While models usually consider only exter- 
nal (tidal) shears where these coefficients are constants, in reality 7 C , "f s and 
X 7 are functions of radius (i.e. Eqn. IB.48|) . Along with the external shear, 
there is an internal shear 

due to the quadrupole moment of the mass interior to a given radius. We 
introduce the mean radius of the lensed images (9) to make r c and r s di- 
mensionless with magnitudes that can be easily compared to the external 
shear amplitudes 7 C and 7 S . Arguably the critical radius of the lens is a bet- 
ter physical choice, but the mean image radius will be close to the critical 
radius and using it avoids any trivial covariances between the internal shear 
strength and the monopole mass. Usually the internal quadrupole is added 
as part of an ellipsoidal model for the central lens galaxy, but it is useful in 
analytic studies to consider it separately. 



#2,int = -r c ^fcos2(x-Xr) + T 5 r s ^f s in2(x-Xr). (B.52) 
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B.4.1 Common Models for the Monopole 

Most attention in modeling lenses focuses on the monopole or radial mass 
distribution of the lenses. Unfortunately, much of the lensing literature uses 
an almost impenetrable array of ghastly non-standard acronyms to describe 
the mass models even though many of them are identical to well-known fam- 
ilies of density distributions used in stellar dynamics. Here we summarize 
the radial mass distributions which are most commonly used and will keep 
reappearing in the remainder of Part 2. 

The simplest possible choice for the mass distribution is to simply trace 
the light. The standard model for early- type galaxies or the bulges of spiral 
galaxies is the de Vaucouleurs l|1948j) profile with surface density 

S(R) = 7 e exp [-7.67 \{R/R e ) 1/4 - ill , (B.53) 

where the effective radius R e encompasses half the total mass (or light) of 
the profile. Although the central density of a de Vaucouleurs model is finite, 
it actually acts like a rather cuspy density distribution and will generally fit 
the early-type lens data with no risk of producing a detectable central image 
(e.g. Lehar et al. 120001 Keeton 20033). The simplest model for a disk galaxy 
is an exponential disk, 

S(R) = I exp [-R/Rd] (B.54) 

where Rd is the disk scale length. An exponential disk by itself is rarely a 
viable lens model because it has so little density contrast between the center 
and the typical radii of images that detectable central images are almost 
always predicted but not observed. Some additional component, either a de 
Vaucouleurs bulge or a cuspy dark matter halo, is always required. This 
makes spiral galaxy lens models difficult because they generically require 
two stellar components (a bulge and a disk) and a dark matter halo, while 
the photometric data are rarely good enough to constrain the two stellar 
components (e.g. Mailer, Flores & Primack 119971 Koopmans et al. 1998 
Mailer et al. EM Trott & Webster 12302 Winn, Hall & Schechter IM)3f . 
Since spiral lenses are already relatively rare, and spiral lens galaxies with 
good photometry are rarer still, less attention has been given to these systems. 
The de Vaucouleurs and exponential disk models are examples of Sersic (1968 ) 
profiles 

S{R) = I Q exp \-b n \{R/R e (n)) 1/n ] 1 (B.55) 

where the effective radius R e (n) is defined to encompass half the light and n = 
4 is a de Vaucouleurs model and n = 1 is an exponential disk. These profiles 
have not been used as yet for the study of lenses except for some quasar host 
galaxy models fi iB.10(l . The de Vaucouleurs model can be approximated (or 
the reverse) by the Hernquist |199Q jl model with the 3D density distribution 

p(r) = — - ffl (B.56) 
Hy ' tit (a + r) 3 y J 
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and a ~ 0.55i? e if matched to a de Vaucouleurs model. For lensing purposes, 
the Hcrnquist model has one major problem. Its p oc 1/r central density cusp 
is shallower than the effective cusp of a de Vaucouleurs model, so Hernquist 
models tend to predict detectable central images even when the matching de 
Vaucouleurs model would not. As a result, the Hernquist model is more often 
used as a surrogate for dynamical normalization of the de Vaucouleurs model 
than as an actual lens model (see below). 

Theoretical models for lenses started with simple, softened power laws of 
the form 

k(R) oc (R 2 + s 2 ) - ( "~ 1)/2 i? 1 -" (B.57) 

in the limit where there is no core radius. We are using these simple power 
law lenses in all our examples (see flB.3(l . These models include many well 
known stellar dynamical models such as the singular isothermal sphere (SIS, 
n = 2, s = 0), the modified Hubble profile (n — 3) and the Plummer model 
(n = 5). Since we only see the projected mass, these power laws are also 
related to common models for infinitely thin disks. The Mestel (Pj)63) disk 
(n = 2, s = 0) is the disk that produces a flat rotation curve, and the Kuzmin 
( 1956) disk (n = 3) can be used to mimic the rising and then falling rotation 
curve of an exponential disk. The softened power-law models have generally 
fallen out of favor other than as simple models for some of the visible compo- 
nents of lenses because the strong evidence for stellar and dark matter cusps 
makes models with core radii physically unrealistic. While ellipsoidal versions 
of these models are not available in useful form, there are fast series expan- 
sion methods for numerical models (Chae, Khersonsky & Turnshek j 1998 , 
BarkanaEHnEI). 

Most "modern" discussions of galaxy density distributions are based on 
sub-cases of the density distribution 

P( r ) K r n , Q _j_ ra y{m-n)/a ' (B.58) 

which has a central density cusp with p oc r - ", asymptotically declines as 
p oc r~ m and has a break in the profile near r ~ a whose shape depends on a 
(e.g. Zhao ll99*7f) . The most common cases are the Hernquist model (n = 1, 
m = 4, a = 1) mentioned above, the Jaffe l|1983fl model (n = 2, m = 4, 
a = 1), the NFW (Navarro, Frenk & White UWty model (n = 1, m = 3, 
a = 1) and the Moore l|1998|l model (n = 3/2, m = 3, a = 1). We can 
view the power-law models either as the limit n — > and a — 2, or we could 
generalize the r~ n term to (r 2 +s 2 )~ n / 2 and consider only regions with r and 
s <C a. Projections of these models are similar to surface density distributions 
of the form 

k(R) cx — — r , r-r- (B.59) 

v ' -R" [ a a + i^a)( m ~")/ Q 

(although the definition of the break radius a changes) with the exception of 
the limit n — > 1 where the projection of a 3D density cusp p oc 1/r produces 
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surface density terms n cx lni? that cannot be reproduced by the broken 
surface density power law. This surface density model is sometimes called 
the Nuker law (e.g. Byun et al. H996|) . A particularly useful case for lensing is 
the pseudo-Jaffe model with n — 2, m — 4 and a = 2 (where the normal Jaffc 
model has a = 1) as the only example of a broken power law with simple 
analytic deflections even when ellipsoidal because the density distribution is 
the difference between two isothermal ellipsoids (see Ean. lB.4T|l . These cuspy 
models also allow fast approximate solutions for their ellipsoidal counterparts 
(see Chae 2002). 

The most theoretically important of these cusped profiles is the NFW 
profile (Navarro et al. 1996) because it is the standard model for dark matter 
halos. Since it is such a common model, it is worth discussing it in a little more 
detail, particularly its peculiar normalization. The NFW profile is normalized 
by the mass M„j r inside the virial radius r V i r , with 

Pn fw (t) — ~r~~F7~T 7 7 \2 and Mnfw{< t ) = ^—jT-r— (B.60) 
4tt/(c) r(r + a) z /(c) 

where /(c) = ln(l + c) — c/(l + c) and the concentration c = r V i r /a ~ 5 
for clusters and c ~ 10 for galaxies. The concentration is a function of mass 
whose scaling is determined from N-body simulations. A typical scaling for 
a halo at redshift z in an Qm = 0.3 flat cosmological models is (Bullock et 
al. 2001) 

9 / M ■ \ ~ ai4 

C (M) = -— -^f ] (B.61) 

V ' l + z \8 x lQ 12 hM Q J V ' 

with a dispersion in logc of <Ji g(c) — 0-18 dex. Because gravitational lens- 
ing is very sensitive to the central density of the lens, including the scatter 
in the concentration is quantitatively important for lensing by NFW halos 
(Keeton l200TbTl . The virial mass and radius are related and determined by 
the overdensity A vir (z) required for a halo to collapse given the cosmological 
model and the redshift. This can be approximated by 

M,„ . £w.)*eW. - °-23 x ^ (So^) 3 (^) «e 

(B.62) 

where p u (z) = 3Hq (1m(^+z) 3 /8ttG is the mean matter density when the halo 
forms and A V i r ~ (187r 2 + 82x — 39x 2 )//2(z) with x = Q—l is the overdensity 
needed for a halo to collapse. There are differences in normalizations between 
authors and with changes in the central cusp exponent 7, but models of this 
type are what we presently expect for the structure of dark matter halos 
around galaxies. 

For most lenses, HST imaging allows us to measure the spatial distribution 
of the stars, thereby providing us with a model for the distribution of stellar 
mass with only the stellar mass-to-light ratio as a parameter. For present pur- 
poses, gradients in the stellar mass-to-light ratio are unimportant compared 
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to the uncertainties arising from the dark matter. Unless we are prepared to 
abandon the entire paradigm for modern cosmology, the luminous galaxy is 
embedded in a dark matter halo and we must decide how to model the over- 
all mass distribution. The most common approach, as suggested by the rich 
variety of mass profiles we introduced in flB.4.11 is to assume a parametric 
form for the total mass distribution rather than attempting to decompose it 
into luminous and dark components. The alternative is to try to embed the 
stellar component in a dark matter halo. Operationally, doing so is trivial - 
the lens is simply modeled as the sum of two mass components. However, 
there are theoretical models for how CDM halos should be combined with 
the stellar component. 

Most non-gravitational lensing applications focus on embedding disk galax- 
ies in halos because angular momentum conservation provides a means of 
estimating a baryonic scale length (e.g. Mo, Mao & White 1998). The spin 
parameter of the halo sets the angular momentum of the baryons, and the 
final disk galaxy is defined by the exponential disk with the same angular 
momentum. As the baryons become more centrally concentrated, they pull 
the dark matter inwards as well through a process known as adiabatic con- 
traction (Blumenthal et al. 198J3). The advantage of this approach, which in 
lensing has been used only by Kochanek & White (2001), is that it allows a 
full ab initio calculation of lens statistical properties when combined with a 
model for the cooling of the baryons (see £|B.7|) . It has the major disadvantage 
that most lens galaxies are early- type galaxies rather than spirals, and that 
there is no analog of the spin parameter and angular momentum conservation 
to set the scale length of the stellar component in a model for an early-type 
galaxy. 

Models of early-type galaxies embedded in CDM halos have to start with 
an empirical estimate of the stellar effective radius. In models of individual 
lenses this is a measured property of the lens galaxy (e.g. Rusin et al. |2003 
2004, Koopmans & Treu 2002] Kochanek I2()()8ajl . Statistical models must 
use a model for the scaling of the effective radius with luminosity or other 
observable parameters of early- type galaxies (e.g. Keeton r2001b|) . From the 
luminosity, a mass-to-light ratio is used to estimate the stellar mass. If all 
baryons have cooled and been turned into stars, then the stellar mass provides 
the total baryonic mass of the halo, otherwise the stellar mass sets a lower 
bound on the baryonic mass. Combining the baryonic mass with an estimate 
of the baryonic mass fraction yields the total halo mass to be fed into the 
model for the CDM halo. 

In general, there is no convincing evidence favoring either approach - for 
the regions over which the mass distributions are constrained by the data, 
both approaches will agree on the overall mass distribution. However, there 
can be broad degeneracies in how the total mass distribution is decomposed 
into luminous and dark components (see EjB.4.6|) . 
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Fig. B.19. Dependence of the shear generated by other objects along the line of 
sight for both linear (light lines) and non-linear (heavy lines) power spectra, (a) 
Shows the logarithmic contribution to the rms effective shear for a source at redshift 
z a = 3 as a function of wave vector k. (b) Shows the dependence on as for a fixed 
power spectrum shape Quh = 0.25. (c) Shows the dependence on the shape Quh 
with as = 0.6 for Qm = 1 and erg = 1.0 for Qu < 1. (d) Shows the variation in the 
shear with source redshift for the models in (c) with f^Mh = 0.25. 



B.4.2 The Effective Single Screen Lens 

Throughout these notes we will treat lenses as if all the lens components 
lay at a single redshift ("the single screen approximation"). The lens equa- 
tions for handling multiple deflection screens (e.g. Blandford & Narayan 1986 
Kovner[l987b, Barkana ll996[l are known but little used except for numerical 
studies (e.g. Kochanek & Apostolakis 119981 Moller & Blain I2001f) in large 
part because few lenses require multiple lens galaxies at different redshifts 
with the exception of B2114+022 (Chae, Mao & Aueusto IMTTjl . In fact, we 
are not being as cavalier in making this approximation as it may seem. 
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The vast majority of strong lenses consist of a single lens galaxy perturbed 
by other objects. We can divide these objects into those near the primary 
lens, where a single screen is clearly appropriate, and those distributed along 
the line of site for which a single screen may be inappropriate. Because the 
correlation function is so strong on small scales, the perturbations are domi- 
nated by objects within a correlation length of the lens galaxy (e.g. Keeton, 
Kochanek & Seliak lTMTl Holder & Schechter l3tM|> . The key to the relative 
safety of the single screen model is that weak perturbations from objects 
along the line of site, in the sense that in a multi-screen lens model they 
could be treated as a convergence and a shear, can be reduced to a single 
"effective" lens plane in which the true amplitudes of the convergence and 
shear are rescaled by distance ratios to convert them from their true redshifts 
to the redshift of the single screen (Kovner lT987bl Barkana HMBjl . The lens 
equation on the effective single screen takes the form 



where Fos, Fls and Fol describe the shear and convergence due to pertur- 
bations between the observer and the source, the lens and the source and the 
observer and the lens respectively. For statistical calculations this can be sim- 
plified still further by making the coordinate transformation 0' = (I + Fol)9 
and (3' = (J + Fls)P) to leave a lens equation, 



identical to a single screen lens in an effective convergence and shear of 
F e = Fol + Fls — Fos (to linear order). In practice it will usually be safe 
to neglect the differences between Eans. lB~63l and lB.64l bccause the shearing 
terms affecting the deflections in Eqn IB. 631 are easily mimicked by modest 
changes in the ellipticity and orientation of the primary lens. The rms am- 
plitudes of these perturbations depend on the cosmological model and the 
amplitude of the non-linear power spectrum, but the general scaling is that 

3/2 

the perturbations grow as D s with source redshift, and increase for larger 
us and f^M as shown in Fig. IB. 19l from Keeton et al. (1997). The importance 
of these effects is very similar to concerns about the effects of lenses along 
the line of sight on the brightness of high redshift supernova being used to 
estimate the cosmological model (e.g. Dalai et al. I2003fl . 

B.4.3 Constraining the Monopole 

The most frustrating aspect of lens modeling is that it is very difficult to 
constrain the monopole. If we take a simple lens and fit it with any of the 
parametric models from i|B.4.1l it will be possible to obtain a good fit pro- 
vided the central surface density of the model is high enough to avoid the 
formation of a central image. As usual, it is simplest to begin understanding 



(3 = (I + Fos) 6-(I + F LS ) a [(I + F OL ) 9} 



(B.63) 



= (I + F e ) 9' -ex [9'} , 



(B.64) 
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the problem with a circular, two- image lens whose images lie at radii 9 a and 



6b from the lens center (Tig. IB. 20)) . The lens equation (|B.4|) constrains the 
deflections so that the two images correspond to the same source position, 

= 9 A -a(6 A ) = -d B +a(9 B ), (B.65) 

where the sign changes appear because the images are on opposite sides of the 
lens. Recall that for the power-law lens model, a(9) — b n ~ 1 9 2 ~ n fEcin. lB~9j) . 
so we can easily solve the constraint equation to determine the Einstein radius 
of the lens, 



b = 



0A + 0B 



9 2 A - n + 6 B - 



iV("-i) 

(B.66) 



in terms of the image positions. In the limit of an SIS (n — 2) the Einstein 
radius is the arithmetic mean, b = (9a + $b)/2, and in the limit of a point 
source (n — * 3), it is the geometric mean, b = (6 a 9b) 1 ^ 2 , of the image radii. 
More generally, for any deflection profile a(9) = bf(9), the two images simply 
determine the mass scale b = (6 A + + f(ds))- 

There are two important lessons here. First, the location of the tangential 
critical line is determined fairly accurately independent of the mass profile. 
We may only be able to determine the mass scale, but it is the most accurate 
measurement of galaxy masses available to astronomy. The dependence of 
the mass inside the Einstein radius on the shape of the deflection profile is 
weak, with fractional differences between profiles being of order (89/ (0)) 2 /8 
where 89 = 9 A — 9b and (9) = (9 A + 9b) /2 (i.e. if the images have similar 
radii, the difference beween the arthmetic and geometric mean is small). 
Second, it is going to be very difficult to determine radial mass distributions. 
In this example there is a perfect degeneracy between the exact location of 
the tangential critical line b and the exponent n. In theory, this is broken by 
the flux ratio of the images. However, a simple two-image lens has too few 
constraints even with perfectly measured flux ratios because a realistic lens 
model must also include some freedom in the angular structure of the lens. 
For a simple four-image lens, there begin to be enough constraints but the 
images all have similar radii, making the flux ratios relatively insensitive to 
changes in the monopole. Combined with the systematic uncertainties in flux 
ratios, they are not useful for this purpose. 

This example also leads to the major misapprehension about lens models 
and radial mass distributions, in that the constraints appear to lead to a 
degeneracy related to the global structure of the potential (i.e. the exponent 
n). This is not correct. The degeneracy is a purely local one that depends only 
on the structure of the lens in the annulus defined by the images, 9b < 9 < 9 A , 
as shown in Fig. IB.20l To see this we will rewrite the expression for the bend 
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Fig. B.20. A schematic diagram of a two-image lens. The lens galaxy lies at the 
origin with two images A and B at radii 6 a and 6 b from the lens center. The images 
define an annulus of average radius (6) = (9 a + 9 b) /2 and width 89 = 6a — 9b, 
and they subtend an angle A\ab relative to the lens center. For a circular lens 
4xas = 180° by symmetry. 



angle (Ean. IBT3|) as 



<») = a 



udun(u) 



udun(u) 



-[b 2 B + (9 2 - 9 2 B ){K)(6,d B )] 



(B.67) 

where b B — 2 J Q B udun(u) is the Einstein radius of the total mass interior to 
image B, and 



uduniu) 



(B.68) 



is the mean surface density in the annulus 9 B < u < 9. If we now solve the 
constraint Ean. IB. 65l again, wc find that 



b B =9 A 9 B -U) AB 9 B ( 



(B.69) 



where (k)ab = (k)(9a,9 b ) is the mean density in the annulus 9 B < 9 < 9 a 
between the images. Thus, there is a degeneracy between the total mass 
interior to image B and the mean surface density (mass) between the two 
images. There is no dependence on the distribution of the mass interior to 
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Fig. B.21. Softened power law and cusped model fits to the images produced by 
an SIS lens with Einstein radius b = l'.'O and two source components located 0. I 
and 0.5 from the lens center. In the top panel, the contours show the regions with 
astrometric fit residuals per image of 0'.'003 and O'.'OIO. Models with m = 3 cusps 
so closely overly the m = 4 models that their error contours were not plotted. 
The bottom panel shows the deflection profiles of the best models at half-integer 
increments in the exponent n. The SIS model has a constant deflection, and the 
power-law and cusp models approach it in a sequence of slowly falling deflection 
profiles. All models agree with the SIS Einstein radius at r = l'.'O. The positions of 
the images are indicated by the vertical bars. 



B Strong Gravitational Lensing 47 



9b , the distribution of mass between the two images, or on either the amount 
or distribution of mass exterior to 9 a- This is Gauss' law for gravitational 
lens models. 

If we normalize the mass scale at any point in the interior of the annulus 
then the result will appear to depend on the distribution of the mass simply 
because the mass must be artificially divided. For example, suppose we model 
the surface density locally as a power law n oc 9 1 ~ n with a mean surface 
density (k) in the annulus 9b < 9 < 9 a between the images. The mass inside 
the mean image radius (9) is 
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(4 - n)(2 - ra)(l - n) 
192 




where we have expanded the result in the ratio 59/(9) (in fact, the result 
as shown is exact for n = 2/3, 1, 2, 4 and 5). We included in this result an 
additional, global convergence kq so that we can contrast the local degen- 
eracies due to the distribution of matter between the images with the global 
degeneracies produced by a infinite mass sheet. The leading term 9a9b is 



the Einstein radius expected for a point mass lens (Eqn. IB. 65(1 . While the 
total enclosed mass (9a9b) is fixed, the mass associated with the lens galaxy 
6^ must be modified in the presence of a global convergence by the usual 
1 — Ko factor created by the mass sheet degeneracy (Falco, Gorenstein & 
Shapiro 1985). The structure of the lens in the annulus leads to fractional 
corrections to the mass of order (89/ (9)) 2 that are proportional to u(k) to 
lowest order. 

Only if you have additional images inside the annulus can you begin to 
constrain the structure of the density in the annulus. The constraint is not, 
unfortunately, a simple constraint on the density. Suppose that we see an 
additional (pair) of images on the Einstein ring at 9q, with 9b < 9q < 9a 
This case is simpler than the general case because it divides our annulus into 
two sub-annuli (from 9b to 9o and from 9q to 9a) rather than three. Since 
we put the extra image on the Einstein ring, we know that the mean surface 
density interior to 9q is unity fEan. lB~.ll|) . The A and B images then constrain 
a ratio 
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of the average surface densities between the Einstein ring and image B ((n) bo) 
and the Einstein ring and image A ((k)ao)- Since a physical distribution must 
have < (k)ao < ( k )boi the surface density in the inner sub-annulus must 
satisfy 

9a + 9 b 91-9 a 9b <{n)m<l (R72) 
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where the lower (upper) bound is found when the density in the outer sub- 
annulus is zero (when (k)bo = (k}ao)- The term 9q — 9a&b is the difference 
between the measured critical radius 9g and the critical radius implied by 
the other two images for a lens with no density in the annulus (e.g. a point 
mass), (OaOb) 1 ^ 2 - Suppose we actually have images formed by an SIS, so 
9a = #o(l + x) and 9 B = 9 (1 - x) with < x — f3/9 Q < 1, then the lower 
bound on the density in the inner sub-annulus is 

2r 

and the fractional uncertainly in the surface density is unity for images near 
the Einstein ring (x — > 0) and then steadily diminishes as the A and B 
images are more asymmetric. If you want to constrain the monopole, the 
more asymmetric the configuration the better. This rule becomes still more 
important with the introduction of angular structure. 

Fig. IB. 2 ll illustrates these issues. We arbitrarily picked a model consisting 
of an SIS lens with two sources. One source is close to the origin and produces 
images at 9a = l'-'l and 9b = ft' 9. The other source is farther from the origin 
with images at 9a = 1''5 and 9b — ft' 5. We then modeled the lens with either 
a softened power law fEan. |B.57|) or a three-dimensional cusp (Ean. |B.58(1 . 
We did not worry about the formation of additional images when the core 
radius becomes too large or the central cusp is too shallow - this would rule 
out models with very large core radii or shallow central cusps. If there were 
only a single source, either of these models can fit the data for any values 
of the parameters. Once, however, there are two sources, most of parameter 
space is ruled out except for degenerate tracks that look very different for 
the two mass models. Along these tracks, the models satisfy the additional 
constraint on the surface density given by Eqn. IB. 711 The first point to make 
about Fig. lB.2ll is the importance of carefully defining parameters. The input 
SIS model has very different parameters for the two mass models - while the 
exponent n — 2 is the same in both cases, the SIS model is the limit s — > 
for the core radius in the softened power law, but it is the limit a — > oo for 
the break radius in the cusp model. Similarly, models with an inner cusp 
n = will closely resemble power law models whose exponent n matches the 
outer exponent m of the cuspy models. Our frequent failure to explain these 
similarities is one reason why lens modeling seems so confusing. The second 
point to make about Fig. lB.2i1 is that the deflection profiles implied by these 
models are fairly similar over the annulus bounded by the images. Outside the 
annulus, particularly at smaller radii, they start to show very large fractional 
differences. Only if we were to add a third set of multiple images or measure a 
time delay with a known value of H Q would the parameter degeneracy begin 
to be broken. 

These general results show that studies of how lenses constrain the monopole 
need the ability to simultaneously vary the mass scale, the surface density 
of the annulus and possibly the slope of the density profile in the annulus to 
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have the full range of freedom permitted by the data. Most parametric stud- 
ies constraining the monopole have had two parameters, adjusting the mass 
scale and a correlated combination of the surface density and slope (e.g. 
Kochanek UnMi Impey et al. HTMl Chae, Turnshek & Khersonsky 11998, 
Barkana et al. lTMH ChaelHSl Cohn et al. lMfll Munoz et al. lMTTl Wuck- 
nitz et al. l2(J(J4f) . although there are exceptions using models with additional 
degrees of freedom (e.g. Bernstein & Fischer ITMfl Keeton et al. l2UUUl Trott 
& Webster EEM Winn, Rusin & Kochanek lM)3jl . This limitation is probably 
not a major handicap, because realistic density profiles show a rather limited 
range of local logarithmic slopes. 

B.4.4 The Angular Structure of Lenses 

Assuming you have identified all the halos needed to model a particular lens, 
there are three sources of angular structure in the potential. The first source 
is the shape of the luminous lens galaxy, the second source is the dark matter 
in the halo of the lens, and the third source is perturbations from nearby 
objects or objects along the line of sight. Of these, the only one which is 
easily normalized is the contribution from the stars in the lens galaxy, since 
it must be tightly connected to the monopole deflection of the stars. The 
observed axis ratios of early-type galaxies show a deficit of round galaxies, 
a plateau for axis ratios from q ~ 0.9 to q ~ 0.5 and then a sharp decline 
beyond q ~ 0.5 (e.g. Khairul & Ryden[2002). Not surprisingly, the true el- 
liptical galaxies are rounder than the lenticular (SO) galaxies even if both 
are grouped together as early-type galaxies. In three dimensions, the stellar 
distributions are probably close to oblate with very modest triaxialities (e.g. 
Franx et al. [1991). Theoretical models of galaxy formation predict elliptic- 
ities and triaxialities larger than observed for luminous galaxies and show 
that the shapes of the dark matter halos are significantly modified by the 
cooling baryons (Dubinski 119921 119941 Warren et al. 119921 Kazantzidis et 
al. 2004). Local estimates of the shape of dark matter halos are very limited 
(e.g. Oiling & Merrifield lMiTl Buote et al. 1271712) . Stellar isophotes also show 
deviations from perfect ellipses (e.g. Bender et al. 119891 Rest et al.[2001) and 
the deviations of simulated halos from ellipses have a similar amplitude (Heyl 
et al.UJSI Burkert & Naab I2UTJ5|| . 

It is worth considering two examples to understand the relative impor- 
tance of the higher order multipoles of a lens. The first is the singular isother- 
mal ellipsoid (SIE) introduced in gB.3l fEans. IB.38IB.40j) . Let the major axis 
of the model lie on the 6\ axis, in which case only the cos(tox) multipoles 
with m = 2, 4, • • • are non-zero. All non-zero poles also have the same radial 
dependence, with K cm = A m /9 and = — 2A m 8/(m 2 — 1). The ratio of the 
internal to the external multipole depends only on the index of the multipole, 
&cm,int/&cm,ext = (m — I) / (m + 1). Note, in particular, that the quadrupole 
moment of an SIE is dominated by the matter outside any given radius, with 
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Fig. B.22. Behavior of the angular multipoles for the de Vaucouleurs (solid), SIE 
(dashed) and NFW (dotted) models with axis ratios of either q — 0.75 (Top) or 
q — 0.5 (Bottom) as a function of radius from the lens center in units of the lens 
major axis scale Rmajor- For each axis ratio, the lower panel shows the ratio of the 
maximum angular deflections produced by the quadrupole (m = 2) and the m = 4 
pole relative to the deflection produced by the monopole (m = 0). The upper panel 
shows the fraction of the quadrupole generated by the mass interior to each radius. 
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an internal quadrupole fraction of 

f int = ^ff = \ (B.74) 

For lenses dominated by dark matter halos that have roughly flat global rota- 
tion curves, most of the quadrupole moment is generated outside the Einstein 
ring of the lens (i.e. by the halo!). This will hold provided any halo trun- 
cation radius is large compared to the Einstein ring radius. The tangential 
deflection is larger than the radial deflection, with \ct cm ^ ra d/ ot cm .t a n\ = 1/m. 
The final question is the relative amplitudes between the poles. The ratio of 
the angular deflection from the m = 2 quadrupole to the radial deflection of 
the monopolc is 
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while the ratio for the m = 4 quadrupole is 

.2 



2 32 



Ot c 4,tan ^ e 
Ot0,rad 20 



19 



1 + £+ 24 e 



2 



(B.75) 



(B.76) 



where the axis ratio of the ellipsoid is q = 1 — e. Each higher order multipole 
has an amplitude oc e m / 2 to leading order. 

The relative importance of the higher order poles can be assessed by 
computing the deflections for a typical lens with the monopole deflection 
(essentially the Einstein radius) fixed to be one arc second. Using the leading 
order scaling of the power-series, but setting the numerical value to be exact 
for an axis ratio q = 1/2, the angular deflection from the quadrupole is 0'.'46e 
and that from the m = 4 pole is 0'.'09e 2 , while the radial deflections will be 
smaller by a factors of 2 and 4 respectively. Since typical astrometric errors 
are of order 0'.'005, the quadrupole is quantitatively important for essentially 
any ellipticity while the m = 4 pole becomes quantitatively important only 
for q <, 0.75 (and the m = 6 pole becomes quantitatively important for 
q£0.50) 

In Fig. IB.22l we compare the SIE to ellipsoidal de Vaucouleurs and NFW 
models. Unlike the SIE, these models are not scale free, so the multipoles 
depend on the distance from the lens center in units of the major axis scale 
length of the lens, Rmajor- The behavior of the de Vaucouleurs model will be 
typical of any ellipsoidal mass distribution that is more centrally concentrated 
than an SIE. Although the de Vaucouleurs model produces angular deflections 
similar to those of an SIE on small scales (for the same axis ratio), these 
are beginning to decay rapidly at the radii where we see lensed images (1— 
2R ma j or ) because most of the mass is interior to the image positions and the 
amplitudes of the higher order multipoles decay faster with radius than the 
monopole (see Ean. IB.48j l. Similarly, as more of the mass lies at smaller radii, 
the quadrupole becomes dominated by the internal quadrupole. The NFW 
model has a somewhat different behavior because on small scales it is less 
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centrally concentrated than an SIE (a p oc 1/r central density cusp rather 
than oc 1/r 2 ). It produces a somewhat bigger quadrupole for a given axis 
ratio, and an even larger fraction of that quadrupole is generated on large 
scales. In a "standard" dark matter halo model, the region with 9 < R ma jor 
is also where we see the lensed images. On larger scales, the NFW profile is 
more centrally concentrated than the SIE, so the quadrupole begins to decay 
and becomes dominated by the internal component. 

It is unlikely that mass distributions are true ellipsoids producing only 
even poles (m = 2, 4, • • •) with no twisting of the axes with radius. For 
model fits we need to consider the likely amplitude of these deviations and 
the ability of standard terms to absorb and mask their presence. It is clear 
from Fig. IB. 221 that the amplitude of any additional terms must be of order 
the m = 4 deflections expected for an ellipsoid for them to be important. 
Here we illustrate the issues with the first few possible terms. 

A dipole moment (m = 1) corresponds to making the galaxy lopsided with 
more mass on one side of the lens center than the other. Lopsidedness is not 
rare in disk galaxies (~30% at large radii, Zaritsky & Rix 1997), but is little 
discussed (and hence presumably small) for early-type galaxies. Certainly in 
the CASTLES photometry of lens galaxies we never see significant dipole 
residuals. It is difficult (impossible) to have an equilibrium system supported 
by random stellar motions with a dipole moment because the resulting forces 
will tend to eliminate the dipole. Similar considerations make it difficult to 
have a dark matter halo offset from the luminous galaxy. Only disks, which are 
supported by ordered rather than random motion, permit relatively long-lived 
lopsided structures. Where a small dipole exists, it will have little effect on 
the lens models unless the position of the lens galaxy is imposed as a stringent 
constraint. The reason is that a dipole adds terms to the effective potential 
of the form 9iG(9) whose leading terms are degenerate with a change in the 
unknown source position. 

Perturbations to the quadrupole (relative to an ellipsoid) arise from vari- 
ations in the ellipticity or axis ratio with radius. Since realistic lens models 
require an independent external shear simply to model the local environ- 
ment, it will generally be very difficult to detect these types of perturbations 
or for these types of perturbations to significantly modify any conclusions. 
In essence, the amplitude and orientation of the external shear can capture 
most of their effects. Their actual amplitude is easily derived from pertur- 
bation theory. For example, if there is an isophote twist of A\ between the 
region inside the Einstein ring and outside the Einstein ring, the fractional 
perturbations to the quadrupole will be of order Ax, or approximately eAx/3 
of the monopole - independent of the ability of the external shear to mimic 
the twist, the actual amplitude of the perturbation is approaching the typical 
measurement precision unless the twist is very large. Only in Q0957+561 have 
models found reasonably clear evidence for an effect arising from isophotal 
twists and ellipticity gradients, but both distortions are unusually large in 
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this system (Keeton et al.EOOO). In general, in the CASTLES photometry of 
lens galaxies, deviations from simple ellipsoidal models are rare. 

Locally we observe that the isophotes of elliptical galaxies are not perfect 
ellipses (e.g. Bender et al. 119891 Rest et al. I2001JI and simulated halos show 
deviations of similar amplitude (Heyl et al. 119941 Burkert & Naab |2003 ) . 
For lensing calculations it is useful to characterize these perturbations by a 
contribution to the lens potential and surface density of 

6 6 1 — 77V^ 

W = — 9cosm(x - Xm) and K m = — cosm(x~Xm) (B.77) 

m am 

respectively where the amplitude of the term is related to the usual isophote 
parameter a m = e m |l — m 2 \/mb for a lens with Einstein radius b. A typical 
early-type galaxy might have |a4 1 ~ 0.01, so their fractional effect on the 
deflections, \ei\/b ~ |»4|/4 ~ 0.003, will be comparable to the astromctric 
measurement accuracy. 

B.4.5 Constraining Angular Structure 

The angular structure of lenses is usually simply viewed as an obstacle to 
understanding the monopole. This is a serious mistake. The reason angular 
structure is generally ignored is that the ability to accurately constrain the 
angular structure of the gravitational field is nearly unique to gravitational 
lensing. Since we have not emphasized the ability of lenses to measure angular 
structure and other methods cannot do so very accurately, there has been 
little theoretical work on the angular structure of galaxies with dark matter. 
Both theoretical studies of halos and modelers of gravitational lenses need to 
pay more attention to the angular structure of the gravitational potential. 

We start by analyzing a simple two-image lens using our non-parametric 
model of the monopole fEan. lB~67jl in an external shear fEan. lB~5l"|) . The two 
images are located at 6a = #A(cosxA,sinx^), and 6b = 9 b{cosxb,sitixb) 
as illustrated in Fig. IB. 201 To illustrate the similarities and differences be- 
tween shear and convergence, we will also include a global convergence Kq 
in the model. This corresponds to adding a term to the lens potential of the 
form (1/2)kq9 2 . The model now has five parameters - two shear components, 
the mass and surface density of the monopole model and the additional global 
convergence. We have only two astrometric constraints, and so can solve for 
only two of the five parameters. Since the enclosed mass is always an inter- 
esting parameter, we can only solve for one of the two shear components. In 
general, we will find that the amplitude of j c depends on the amplitude of 7 S . 
There is, however, a special choice of the shear axis, Xj = (xa +X-b)/2 + tt/4, 
such that the shear parameters become independent of each other. This allows 
us to determine the "invariant" shear associated with the images, 
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where AO = \0a — 9b\ is the image separation. The monopole mass and the 
other shear component are degenerate, 

b% + j 2 a 0b = (B.79) 

(i - « ) [Ao\e\ + el) - (A* - elf] ( K ) AB (g| - e%) (Ae* -e\ + g|) 

2zA6> 2 

Several points are worth noting. First, the amplitude of the invariant shear 71 
has the same degeneracy with the (local) surface density between the images 
(k)ab as it does with a global convergence kq. More centrally concentrated 
mass distributions with lower (k)ab require higher external shears to fit 
the same data. Second, the other component 72 introduces an uncertainty 
into the enclosed mass, with a series of somewhat messy trade offs between 
fcg, 72, (k)ab and kq. As a practical matter, the shear does not lead to an 
astronomically significant uncertainty in the mass, since 72 ft, 0.1 in all but 
the most extreme situations. 

The external shear is only one component of the quadrupole. There is also 
an internal shear due to the mass interior to the images (Eqn. IB.52|I . The in- 
ternal and external shears differ in their "handedness" . For the same angular 
deflection (d^P/dx) they have opposite signs for the radial deflection (d^/dO). 
The solution for two images is much the same as for an external shear. There 
is an invariant shear component, whose amplitude scales with 1 — k — (k)ab 
but whose orientation differs from that of the external shear solution. The 
monopole mass b 2 B is degenerate with the 72 shear component and the kq 
and (k)ab surface densities. The actual expressions are too complex to be 
illuminating. 

Fig. IB. 231 illustrates how the invariant shears combine to determine the 
overall structure of the quadrupole for the lens PG1115+080. For each image 
pair there is a line of permitted shears because of the degeneracy between the 
enclosed mass and the second shear component. The invariant shear compo- 
nent is the shear at the point where the line passes closest to the origin. If 
the quadrupole model is correct, the lines for all the image pairs will cross 
at a point, while if it is incorrect they will not. PG1 115+080 is clearly go- 
ing to be well modeled if the quadrupole is dominated by an external shear 
and poorly modeled if it is dominated by an internal shear. This provides a 
simple geometric argument for why full models of PG1115+080 are always 
dominated by an external shear (e.g. Impey et al. [1998 ). A failure of the 
curves to cross in both cases is primarily evidence for a mixture of external 
and internal quadrupoles or the presence of other multipoles rather than for 
a problem in the monopole mass distribution. In Fig. lB.23l we used an SIS for 
the monopole. For a point mass monopole, the figure looks almost the same 
provided we expand the scale - the invariant shear scales as 1 — (k)ab so in 
going from a SIS with 1 — (k)ab — 1/2 to a point mass with 1 — (k)ab = 1 
the shear will double. 

This scaling of the quadrupole with the surface density of the monopole 
provides an as yet unused approach to studying the monopole. Since the mass 
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Fig. B. 23. The invariant shears for the lens PG1115+080 modeled using either 
an external (top) or an internal (bottom) quadrupolc and an SIS monopole. Each 
possible image pair among the Ai, A2, B and C images, constrains the quadrupole 
to lie on the labeled line. The amplitude and orientation of each invariant shear 
is given by the point where the corresponding line passes closest to the origin. 
Models of PG1115+080 show that the quadrupole is dominated by external (tidal) 
shear. Here we see that for the external quadrupole (top), the lines nearly cross at 
a point, so the data are consistent with an almost pure external shear. For an inter- 
nal quadrupole (bottom), the A2B and A2C image pairs require shear parameters 
completely inconsistent with the other images. 
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enclosed by the Einstein radius is nearly constant, the more centrally con- 
centrated constant mass-to-light (M/L) ratio models must have lower surface 
mass densities near the images than the SIE model. As a result, they will re- 
quire quadrupole amplitudes that are nearly twice those of models like the 
SIS with nearly flat rotation curves. Since the typical SIE model of a lens 
has an ellipticity that is comparable to the typical ellipticities of the visi- 
ble galaxies, the more centrally concentrated monopole of a constant M/L 
model requires an ellipticity much larger than the observed ellipticity of the 
lens galaxy. The need to include an external tidal shear to represent the envi- 
ronment allows these models to produce acceptable fits, but the amplitudes 
of the required external shears are inconsistent with expectations from weak 
lensing (Part 3). 

B.4.6 Model Fitting and the Mass Distribution of Lenses 

Having outlined (in perhaps excruciating detail) how lenses constrain the 
mass distribution, we turn to the problem of actually fitting data. These days 
the simplest approach for a casual user is simply to down load a modeling 

package, in particular the lensmodel package (Keeton 2001a) at http:/ /cfa-www. harvard.edu/castles/ 
read the manual, try some experiments, and then apply it intelligently (i.e. 
read the previous sections about what you can extract and what you can- 
not!). Please publish results with a complete description of the models and 
the constraints using standard astronomical nomenclature. 

In most cases we are interested in the problem of fitting the positions 
6i of i = 1 • • • n images where the image positions have been measured with 
accuracy Oi. We may also know the positions and properties of one or more 
lens galaxies. Time delay ratios also constrain lens models but sufficiently ac- 
curate ratios are presently available for only one lens (B1608+656, Fassnacht 
et al. l2002f) . fitting them is already included in most packages, and they add 
no new conceptual difficulties. Flux ratios constrain the lens model, but we 
are so uncertain of their systematic uncertainties due to extinction in the 
ISM of the lens galaxy, microlensing (Part 4) and the effects of substructure 
(see 9B.8|) that we can never impose them with the accuracy needed to add 
a significant constraint on the model. 

The basic issue with lens modeling is whether or not to invert the lens 
equations ("source plane" or "image plane" modeling). The lens equation 
supplies the source position 

P i = 6 i -cx{e i ,p) (B.80) 

predicted by the observed image positions 9i and the current model parame- 
ters p. Particularly for parametric models it is easy to project the images on 
to the source plane and then minimize the difference between the projected 
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source positions. This can be done with a x fit statistic of the form 



Xsrc 



p - (3, x 2 



where we treat the source position (3 as a model parameter. The astrometric 
uncertainties ct^ are typically a few milli-arcseconds. Moreover, where VLBI 
observations give significantly smaller uncertainties, they should be increased 
to approximately 0'/001-0'.'005 because low mass substructures in the lens 
galaxy can produce systematic errors on this order (see ^B.8|) . You can im- 
pose astrometric constraints to no greater accuracy than the largest deflection 
scales produced by lens components you are not including in your models. The 
advantage of xlrc 1S that it is fast and has excellent convergence properties. 
The disadvantages are that it is wrong, cannot be used to compute parameter 
uncertainties, and may lead to a model producing additional images that are 
not actually observed. 

The reason it is wrong and cannot be used to compute parameter errors 
is that the uncertainty cr^ in the image positions does not have any meaning 
on the source plane. This is easily understood if we Taylor expand the lens 
equation near the projected source point /3, corresponding to an image 

P-Pi = M7 1 (G-e i ) (B.82) 

where M~ x is the inverse magnification tensor at the observed location of the 
image. In the frame where the tensor is diagonal, we have that Af3± — X±A9± 
so a positional error Af3± on the source plane corresponds to a positional error 
Ajjj Ap± on the image plane. Since the observed lensed images are almost 
always magnified (usually A + = 1 + k + 7^1 and 0.5 > |A_ = 1 + k — j\ < 
0.05) there is always one direction in which small errors on the source plane 
are significantly magnified when projected back onto the image plane. Hence, 
if you find solutions with x\rc ~ ^dof where N^of is the number of degrees 
of freedom, you will have source plane uncertainties A/3 a%. However, the 
actual errors on the image plane are [i—\M\ larger, so the x 2 on the image 
plane is ~ fi 2 Ndof and you in fact have a terrible fit. 

If you assume that in any interesting model you are close to having a 
good solution, then this Taylor expansion provides a means of using the 
easily computed source plane positions to still get a quantitatively accurate 
fitting statistic, 

X t nt=Z^ ^2 ' (B.83 J 

i 1 

in which the magnification tensor Mi is used to correct the error in the 
source position to an error in the image position. This procedure will be 
approximately correct provided the observed and model image positions are 
close enough for the Taylor expansion to be valid. Finally, there is the exact 
statistic where for the model source position /3 you numerically solve the 
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Fig. B.24. (Top) Goodness of fit \ 2 for cuspy models of Bf933+503 as a function 
of the inner density exponent 7 (p oc r ') and the profile break radius a. Models 
with cusps significantly shallower or steeper than isothermal are ruled out, and 
acceptable models near isothermal must have break radii outside the region with 
the lensed images. 

Fig. B.25. (Bottom) The monopole deflections of the Bf933+503 models for the 
range of permitted cusp exponents 7. The points show the radii of the lensed im- 
ages, and the models only constrain the shape of the monopole in this region. The 
monopole deflection is closely related to the square of the rotation curve. Note the 
similarity to Fig. IB. 211 
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lens equation to find the exact image positions 0i{f3) and then compute the 
goodness of fit on the image plane 

xLj ww^y (B84) 

This will be exact even if the Taylor expansion of xint ^ s breaking down, and 
if you find all solutions to the lens equations you can verify that the model 
predicts no additional visible images. Unfortunately, using the exact Ximg is 
also a much slower numerical procedure. 

As we discussed earlier, even though lens models provide the most accu- 
rate mass normalizations in astronomy, they can constrain the mass distri- 
bution only if the source is more complex than a single compact component. 
Here we only show examples where there are multiple point-like components, 
deferring discussions of models with extended source structure to The 
most spectacular example of a multi-component source is B1933+503 (Sykes 
et al. 119981 see Fig. IB.7|) where a source consisting of a radio core and two 
radio lobes has 10 lensed images because the core and one lobe are quadru- 
ply imaged and the other lobe is doubly imaged. Since we have many images 
spread over roughly a factor of two in radius, this lens should constrain the ra- 
dial mass distribution just as in our discussion for 3B.4.3I Munoz et al. {2001 
also see Cohn et al. 2001 for softened power law models) fitted this system 
with cuspy models (Eqn. 55 with a = 2 and m = 4), varying the inner density 
slope n = 7 (p cx r~ n ) and the break radius a. Fig. IB. 24l shows the resulting 
X 2 as a function of the parameters and Fig. IB. 25l illustrates the range of the 
acceptable monopole mass distributions - both are very similar to Fig. lB~2Tl 
The best fit is for 7 = 1.85 with an allowed range of 1.6 < 7 < 2.0 that com- 
pletely excludes the shallow 7 = 1 cusps of the Hernquist and NFW profiles 
and is marginally consistent with the 7 = 2 cusp of the SIS model. A second 
example, which illustrates how the distribution of mass well outside the re- 
gion with images has little effect on the models, are the Winn et al. (2003) 
models of the three-image lens PMNJ1632-0033 shown in Fig. lB~26l In these 
models the outer slope 77, with p oc r _7? asymptotically, of the density was 
also explored but has little effect on the results. Unless the break radius of 
the profile is interior to the B image, the mass profile is required to be close 
to isothermal 1.89 < < 1.93. 

Unfortunately, systems like B1933+503 and PMNJ1632-0033 are a small 
minority of lens systems. For most lenses, obtaining information on the radial 
density profile requires some other information such as a dynamical mea- 
surement f £|B.4.9(l . a time delay measurement f ijB.5ll or a lensed extended 
component of the source f EIB.lOl) . Even for these systems, it is important to 
remember that the actual constraints on the density structure really only 
apply over the range of radii spanned by the lensed images - the mass inte- 
rior to the images is constrained but its distribution is not, while the mass 
exterior to the images is completely unconstrained. This is not strictly true 



60 C.S. Kochanek 




Fig. B.26. Allowed parameters for cuspy models of PMNJ1632-0033 assuming 
that image C is a true third image. Each panel shows the constraints on the inner 
density cusp j3 (p oc r - ' 3 ) and the break radius r& for three different asymptotic 
density slopes p oc r~ v . A Hernquist model has /3 = 1 and rj = 4, an NFW model 
has (3 = 1 and r\ = 3, and a pseudo-Jaffe model has (3 = 2 and 77 = 4. Unless the 
break radius is place interior to the B image, it is restricted to be close to isothermal 
09 = 2). 



when we include the angular structure of the gravitational field and the mass 
distribution is quasi-ellipsoidal. 

It is also important to keep some problems with parametric models in 
mind. First, models that lack the degrees of freedom needed to describe the 
actual mass distribution can be seriously in error. Second, models with too 
many degrees of freedom can be nonsense. We can illustrate these two lim- 
iting problems with the sad history of Q0957+561 for the first problem and 
attempts to explain anomalous flux ratios (see ^B.8|l with complex angular 
structures in the density distribution for the dark matter for the second. 

Q0957+561, the first lens discovered (Walsh, Carswell & Wevmann lil??9")l 
and the first lens with a well measured time delay (see ^B.5I Schild & Thom- 
son 11971)1 Kundic et al. 119971 and references therein) , is an ideal lens for 
demonstrating the trouble you can get into using parametric models with- 
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out careful thought. The lens consists of a cluster and its brightest clus- 
ter galaxy with two lensed images of a radio source bracketing the galaxy. 
VLBI observations (e.g. Garrett et al. H994[> resolve the two images into thin, 
multi-component jets with very accurately measured positions (uncertain- 
ties as small as 0.1 mas, corresponding to deflections produced by a mass 
scale ~ 10~ 8 of the primary lens!). Models developed along two lines. One 
line focused on models in which the cluster was represented as an external 
shear (e.g. Grogin & Naravan lEMl Chartas et al. 119981 Barkana et al. lB^l 
Chae I1999JI while the other explored more complex models for the cluster 
(see Kochanek ll991b| Bernstein, Tyson & Kochanek 1993, Bernstein & Fis- 
cher I1999fl and argued that external shear models had too few parameters 
to represent the mass distribution given the accuracy of the constraints. The 
latter view was born out by the morphology of the lensed host galaxy (Keeton 
et al. l2()()()[) and direct X-ray observations of the cluster (Chartas et al. .2002) 
which showed that the lens galaxy was within about one Einstein radius of 
the cluster center where a tidal shear approximation fails catastrophically. 
The origin of the problem is that as a two-image lens, Q0957+561 is criti- 
cally short of constraints unless the fine details of the VLBI jet structures 
are included in the models. Many studies imposed these constraints to the 
limit of the measurements while not including all possible terms in the poten- 
tial which could produce a deflection on that scale (i.e. the precision should 
have been restricted to milli-arcseconds rather than micro-arcseconds) . Mod- 
els would adjust the positions and masses of the cluster and the lens galaxy 
in order to reproduce the small scale astrometric details of the VLBI jets 
without including less massive components of the mass distribution (e.g. the 
cllipticity gradient and isophote twist of the lens galaxy, Keeton et al. [2000 ) 
that also affect the VLBI jet structure on these angular scales. Lens models 
must contain all reasonable structures producing deflections comparable to 
the scale of the measurement errors. 

We are in the middle of an experiment exploring the second problem - 
if you include small scale structures but lack the constraints needed to mea- 
sure them, their masses easily become unreasonable unless constrained by 
common sense, physical priors or additional data. Lately this has become an 
issue in studies (Evans & Witt lSBHSl Kochanek & Dalai HUH Quadri, Moller 
& Natarajan 2003, Kawano et al. 2004) of whether the flux ratio anoma- 
lies in gravitational lenses could be due to complex angular structure in the 
lens galaxy rather than CDM substructure or satellites in the lens galaxy (see 
flB.8(l . The problem, as we discuss in the next section on non-parametric mod- 
els f flB.4.7|) . is that lens modeling with large numbers of parameters is closely 
related to solving linear equations with more variables than constraints - as 
the matrix inversion necessary to finding a solution becomes singular, the 
parameters of the mass distribution show wild, large amplitude fluctuations 
even as the fit to the constraints becomes perfect. Thus, a model including 
enough unconstrained parameters is guaranteed to "solve" the anomalous 
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flux ratio problem even if it should not. For example, Evans & Witt |2003) 
could match the flux ratios of Q2237+0305 even though for this lens we know 
from the time variability of the flux ratios that the flux ratio anomalies are 
created by microlensing rather than complex angular structures in the lens 
model (see Part 4). 

If only the four compact images are modeled, then the flux ratio anoma- 
lies can be greatly reduced or eliminated in almost all lenses at the price of 
introducing deviations from an ellipsoidal density distribution far larger than 
expected (see 3B.4.4[I . In some cases, however, you can test these solutions be- 
cause the lens has extra constraints beyond the four compact images. We illus- 
trate this in Fig. IB.27l By adding large amplitude cos 36* and cos AB perturba- 
tions to the surface density model for B1933+503, Kochanek & Dalai (2004) 
could reproduce the observed image flux ratios if they fit only the four com- 
pact sources. However, after adding the constraints from the other lensed 
components, the solution is driven back to being nearly ellipsoidal and the 
flux ratios cannot be fit. In every case, Kochanek & Dalai ( 20QU found that 
the extra constraints drove the solution back toward an ellipsoidal density 
distribution. In short, a sufficiently complex model can fit underconstrained 
data, but that does not mean it makes any sense to do so. 

B.4.7 Non-Parametric Models 

The basic idea behind non-parametric mass models is that the effective lens 
potential and the deflection equations are linear "functions" of the surface 
density. The surface density can be decomposed into multipoles fKochanek ll991al 
Trotter, Winn & Hewitt HOM Evans & Witt EjSDH , pixels (see Saha & 
Williams ITMT1 12TTO1 Williams & Saha ESTOH) . or any other form in which 
the surface density is represented as a linear combination of density func- 
tionals multiplied by unknown coefficients k. In any such model, the lens 
equation for image i takes the form 



where Ai is the matrix that gives the deflection at the position of image i 
in terms of the coefficients of the surface density decomposition k. For a 
lens with i — 1 • ■ • n images of the same source, such a system can be solved 
exactly if there are enough degrees of freedom in the description of the surface 
density. For simplicity, consider a two-image lens so that we can eliminate 
the source position by hand, leaving the system of equations 



(3 = 9 t - A t K 



(B.85) 



01-02 = {Ax-A 2 )k, 



(B.86) 



which is easily solved by simply taking the inverse of the matrix A\ — A 2 to 
find that 

k={A 1 - A 2 )- 1 {6 1 -6 2 ). (B.87) 
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Fig. B.27. Surface density contours for models of B1933+503 including misaligned 
d3 and a,4 multipoles (thin lines) . The model in the top panel is constrained only by 
the 4 compact images (images 1, 3, 4 and 6, filled squares). The model in the bottom 
panel is also constrained by the other images in the lens (the two- image system la/8, 
open squares; the four-image system 2a/2b/5/7 filled triangles; and the two-image 
system comprising parts of 5/7, open triangles) The tangential critical line of the 
model (heavy solid curve) must pass between the merging images 2a/2b, but fails 
to do so in the first model (top panel). 
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Sadly, life is not that simple because as soon as the density decomposition has 
more degrees of freedom than there are constraints, the inverse (A\ — A2)~ 1 
of the deflection operators is singular. 

The solution to this problem is to instead consider the problem as a more 
general minimization problem with a x 2 statistic for the constraints and some 
form of regularization to restrict the results to plausible surface densities. One 
possibility is linear regularization, in which you minimize the function 

F = x 2 + Ak • H ■ K (B.88) 

where the x 2 measures the goodness of fit to the lens constraints, H is a 
weight matrix and A is a Lagrange multiplier. The Lagrange multiplier con- 
trols the relative importance given to fitting the lens constraints (minimizing 
the x 2 ) versus producing a smooth density distribution (minimizing k- H -k). 
The simplest smoothing function is to minimize the variance of the surface 
density (H — I, the identity matrix), or, equivalently, ignore H and use the 
singular value decomposition for inverting a singular matrix. By using more 
complicated matrices you can minimize derivatives of the density (gradients, 
curvature etc.). Solutions are found by adjusting the multiplier A until the 
goodness of fit satisfies x 2 — ^dof where Ndof is the number of degrees 
of freedom. Another solution is to use linear programming methods to im- 
pose constraints such as positive surface densities, negative density gradients 
from the lens center or density symmetries (Saha & Williams 119971 feOO-L 
Williams & Saha 2000). Time delays, which are also linear functions of the 
surface density, are easily included. Flux ratios are more challenging because 
magnifications are quadratic rather than linear functions of the surface den- 
sity except for the special case of the generalized singular isothermal models 
where <F = b9F(x) (Eqn. IR421 Witt, Mao & Keeton IMjffl Kochanek et 
al. 120011 Evans & Witt I20U1[) . The best developed, publicly available non- 
parametric models are those by Saha & Williams ( 2004 ) . These are available 
at http://ankh-morpork.maths.qmc.ac.uk/~saha/astron/lens/ 

Personally, I am not a fan of the non-parametric models, because almost 
all the additional degrees of freedom they include are irrelevant to the prob- 
lem. As I have tried to outline in the preceding sections, there is no real 
ambiguity about the aspects of gravitational potentials either constrained or 
unconstrained by lens models. Provided the parametric models capture these 
degrees of freedom and you do not get carried away with the precision of the 
fits, you can ignore deviations of the cos(16x) term of the surface density 
from that expected for an ellipsoidal model. Similarly for the monopole pro- 
file, the distribution of mass interior and exterior to the images is irrelevant 
and for the most part only the mean surface density between the images 
has any physical effect. Nothing is gained by allowing arbitrary, fine-grained 
distributions. 

There are also specific physical and mathematical problems with non- 
parametric models just as there are for parametric models. First, the trick of 
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linearization only works if the lens equations arc solved on the source plane. 
As we discussed when we introduced model fitting f i|B. 4.6(1 . this makes it im- 
possible to properly compute error bars on any parameters. The equations be- 
come non-linear if they include either the magnification tensor (Eqn. IB.83J) or 
use the true image plane fit statistic fEan. lB~84jl . and this greatly reduces the 
attractiveness of these methods. Second, in many cases the non-parametric 
models are not constrained to avoid creating extra images not seen in the 
observations - the models reproduce the observed images exactly, but come 
with no guarantee that they are not producing 3 other images somewhere else. 
Third, it is very difficult to guarantee that the resulting models are physical. 
For example, consider a simple spherical lens constrained to have positive 
surface density. For the implied three-dimensional density to also be positive 
definite, the surface density must decline monotonically from the center of 
the lens. This constraint is usually applied by the Saha & Williams (2004) 
method. For the distribution function of the stars making up the galaxy to 
be positive definite, the three dimensional density must also decline mono- 
tonically - this implies a constraint on the second derivative of the surface 
density which is not imposed by any of these methods. For the distribution 
to be dynamically stable it must satisfy a criterion on the derivative of the 
distribution function with respect to the orbital energy, and this implies a 
criterion on the third derivative of the surface density which is also not im- 
posed (see Binney & Tremaine ll987fl . Worse yet, for a non-spherical system 
we cannot even write down the constraints on the surface density required 
for the model to correspond to a stable galaxy with a positive definite dis- 
tribution function. In short, most non-parametric models will be unphysical 
- they overestimate the degrees of freedom in the mass distribution. The 
critique being made, parametric models have a role because they define the 
outer limits of what is possible by avoiding the strong physical priors implicit 
in parametric models of galaxies. 

B.4.8 Statistical Constraints on Mass Distributions 

Where individual lenses may fail to constrain the mass distribution, ensembles 
of lenses may succeed. There are two basic ideas behind statistical constraints 
on mass distributions. The first idea is that models of individual lenses should 
be weighted by the likelihood of the observed configuration given the model 
parameters. The second idea is that the statistical properties of lens samples 
should be homogeneous. 

An example of weighting models by the likelihood is the limit on the slopes 
of central density cusps from the observed absence of central images. Rusin 
& Ma l|2001[) considered 6 CLASS (see 3B.6JI survey radio doubles and com- 
puted the probability pi (n) that lens i would have a detectable third image in 
the core of the lens assuming power law mass densities S oc _R 1_ ™ and includ- 
ing a model for the observational sensitivities and the magnification bias (see 
flB. 6.6(1 of the survey. They were only interested in the range n < 2, because 
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Profile slope /? 

Fig. B.28. Limits on the central density exponent for power-law density profiles 
p oc r~ n = r _1_/3 from the absence of detectable central images in a sample of 
6 CLASS survey radio doubles (Rusin & Ma USD- The lighter curves show the 
limits for the individual lenses with the weakest constraint from B0739+366 and the 
strongest from B0218+357, and the heavy solid curve shows the joint probability 
P. 

as discussed in i|B.3l density cusps with n > 2 never have central images. 
For most of the lenses they considered, it was possible to find models of the 
6 lenses that lacked detectable central images over a broad range of density 
exponents. However, the shallower the cusp, the smaller the probability Pi(n) 
of producing a lens without a visible central image. For any single lens, Pi(n) 
varies too little to set a useful bound on the exponent, but the joint proba- 
bility of the entire sample having no central images, P — IIi(l—pi(n)), leads 
to a strong (one-sided) limit that n > 1.78 at 95% confidence (see Fig. IB.28Tl . 
In practice, Keeton (|2003a[l demonstrated that the central stellar densities 
are sufficiently high to avoid the formation of visible central images in almost 
all lenses given the dynamic ranges of existing radio observations (i.e. stel- 
lar density distributions are sufficiently cuspy), and central black holes can 
also assist in suppressing the central image (Mao, Witt & Koopmans|2001 ). 
However, the basic idea behind the Rusin & Ma |200 1| analysis is important 
and underutilized. 

An example of requiring the lenses to be homogeneous is the estimate 
of the misalignment between the major axis of the luminous lens galaxy 
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Fig. B.29. (Top) The integral distribution of misalignment angles A\lm between 
the major axes of the lens galaxy and an ellipsoidal lens model (solid curve with 
points for each lens) . If the two angles were completely uncorrelated, the distribution 
would follow the dashed line. If the two angles were perfectly correlated they would 
follow the solid curve because of the measurement uncertainties in the two angles. 
Fig. B.30. (Bottom) Logarithmic contours of the probability for matching the dis- 
tribution of misalignment angles as a function of the rms misalignment ae between 
the mass and the light and the typical tidal shear y r ms- Theoretically we expect 
tidal shears y r ms — 0.06. The solid contours are spaced by 0.5 dex and the dashed 
contours are spaced by 0.1 dex relative to the maximum likelihood contour. The 
differences between dashed contours are not statistically significant, while those 
between solid contours are statistically significant. 
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Fig. B.31. The internal shear fraction fi nt for the four-image lenses. Each system 
was fitted by an SIS combined with an internal shear and an external shear and 
fint ~ \r\/(\r\ + is the fraction of the quadrupole amplitude due to the internal 
shear. An SIE has fi n t = 1/4 (see Fig IB.22t . Most of the quads have fi nt 
1/4 as expected for an SIE in an additional external (tidal) shear field. Objects 
with very low f int (e.g. HE0435-1223, RXJ0911+0551, B1422+231) have nearby 
galaxies or clusters generating anomalously large external shears, while objects with 
anomalously high f int (B1608+656, HE0230-2130, MG0414+0534) tend to have 
additional lens components like the second lens galaxy of B1608+656. For some 
systems either the imaging data (e.g. B0128+437) or the models (e.g. B2045+265) 
do not allow a clear qualitative explanation. 



and the overall mass distribution by Kochanek (2002b). Fig. IR291 shows the 
misalignment angle A\lm = \xl — Xm\ between the major axis xl of the 
lens galaxy and the major axis xm of an ellipsoidal mass model for the lens. 
The particular mass model is unimportant because any single component 
model of a four-image lens will give a nearly identical value for Xm (e.g. 
Kochanek I1991al Wambsganss & Paczynski I1994f) . The distribution of the 
misalignment angle A\lm is not consistent with the mass and the light being 
either perfectly correlated or uncorrelated. This is not surprising, because a 
simple ellipsoidal model determines the position angle of the mean quadrupole 
moment near the Einstein ring, which is a combination of the quadrupole 
moment of the lens galaxy, the halo of the lens galaxy, and the local tidal 
shear (see £|B.4.4(1 . Even if the lens galaxy and the halo were perfectly aligned, 
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Fig. B.32. The structure of lens galaxies in self-similar models. The top row shows 
the permitted region for the slope of the inner dark matter cusp (p oc r~") and 
the projected fraction of the mass fc dm inside 2R e composed of dark matter. The 
results are shown for three ratios Rb/Re between the break radius Rb of the dark 
matter profile and the effective radius R e of the luminous galaxy. The solid (dashed) 
contours show the 68% and 95% confidence levels for two (one) parameter. Note 
that the estimates of n and }cdm depend little on the location of the break radius 
relative to the effective radius. The bottom row shows all the mass profiles lying 
within the (two parameter) 68% confidence region normalized to a fixed projected 
mass inside 2R e . For comparison we show the mass enclosed by a de Vaucouleurs 
model (dotted line) and an SIS (offset dashed line). While the allowed models 
exhibit a wide range of dark matter abundances, slopes and break radii, they all 
have roughly isothermal total mass profiles over the radial range spanned by the 
lensed images. 



we would still find that the orientation of the mean quadrupole would differ 
from that of the light because of the effects of the tidal shears. We can model 
this by estimating the probability of reproducing the observed misalignment 
distribution in terms of the strength of the local tidal shear 7 rms and the 
dispersion in a x in the angle between the major axis of the mass distribution 
and the light, as shown in Fig. IB. 301 The observed mismatch can either 
be produced by having a typical tidal shear of j rms — 0.05 or by having a 
typical misalignment between mass and light of a x ~ 20°. We know, however, 
that the typical tidal shear cannot be zero because it can be estimated from 
the statistics of galaxies (e.g. Keeton, Kochanek & Seljak 119971 Holder & 
Schechter I2003fl . Keeton et al. (1997) obtained 7 rms ~ 0.05, in which case 
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mass must align with light and we obtain an upper limit of <r x <, 10°. Holder 
& Schechter l|2003(l argue for a much higher rms shear of 7 rms = 0.15 based 
on N-body simulations, which is too high to be consistent with the observed 
alignment of mass models and the luminous galaxy. One possible explanation 
(based on the results of White, Hernquist & Springel 2001 ) is that Holder & 
Schechter ( 2003) included parts of the lens galaxy's own halo in their estimate 
of the external shear. Alternatively, if lens galaxies are more compact than 
the SIE model used by Kochanek (|2002bJ) . then the lower surface density 
(k) raises the required shear (since 7 oc (1 — (k)), Eqn. IB.78J1 . However, 
mass distributions similar to constant mass-to-light ratio models of the lenses 
would be required, which would be inconsistent with shear estimates from 
simulations in which galaxy masses are dominated by extended dark matter 
halos. 

The trade-off between central concentration and shear leads to the the 
interesting question of where the quadrupole structure of lenses originates. 
As we discussed in 3B.4.4I we can break up the quadrupole of the mass distri- 
bution into the internal quadrupole due to the matter interior to the Einstein 
ring (Eqn. IB.52fl and the exterior quadrupole due to the matter outside the 
Einstein ring fEon. lB"5"TJl . While the internal quadrupole is due only to the 
lens galaxy, the external quadrupole is a mixture of the quadrupole from 
the parts of the galaxy outside the Einstein ring (i.e. the dark matter halo) 
and the tidal shear from the environment. An important fact to remember 
is that for an isothermal ellipsoid, only fi nt = 25% of the quadrupole is due 
to mass inside the Einstein ring (see Fig. IB.221 ilB.4.4|) ! Turner, Keeton & 
Kochanek H2()()4|l explored this by fitting all the available four-image lenses 
with an SIS monopolc combined with an internal and an external quadrupole. 
They then computed the fraction of the quadrupole fi n t associated with the 
mass interior to the Einstein ring to find the distribution shown in Fig. IB.3ll 
Most four-image lenses seem to be dominated by the external quadrupole, 
with internal quadrupole fractions below the fi nt = 0.25 fraction expected for 
an isothermal ellipsoid. Lenses clearly in environments with very large tidal 
shears (e.g. RXJ0911+0551 which is near massive cluster, Bade et al. HOOTl 
Kncib et al. 123001 Morgan et al. l200Tl or HE0435-1223 which is near a large 
galaxy, Wisotzki et al. 2002 , see Fig. IB. 4(1 show much smaller internal shear 
fractions. B1608+656 (Myers et al. ITPOBI Fassnacht et al. HOOPf . which has 
two lens galaxies inside the Einstein ring, shows a significantly higher inter- 
nal quadrupole fraction. Combined with the close correlation of mass model 
alignments with the luminous galaxies, this seems to argue for significant 
dark matter halos aligned with the luminous galaxy, but the final step of 
quantitatively assembling all the pieces has yet to be done. 

Statistical analyses can also be used to estimate the radial density dis- 
tribution from samples of lenses which individually cannot. The existence 
of the fundamental plane (see ffl3.9ll strongly suggests that the structure 
of early-type galaxies is fairly homogeneous - in particular it is consistent 
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with galaxies having self-similar mass distributions in the sense that the halo 
structure can be scaled from the structure of the visible galaxy. As a par- 
ticular example based on our theoretical expectations, Rusin, Kochanek & 
Keeton 120031 and Rusin & Kochanek ! 200 1 i modeled the visible galaxy with 



a Hernquist fEan. lBT56(l model scaled to match the observed effective radius 
of the lens galaxy, R e , and then added a cuspy dark matter halo fEan. lB~59l 
with a variable inner cusp 77, a = 2 and m = 3) where the inner density cusp 
(p oc r - ''), the halo break radius r\> and the dark matter fraction fcDM inside 
2R e were kept as variables. The assumption of self-similarity enters by keep- 
ing the ratio rb/R e constant, the dark matter fraction Jcdm constant, and 
then scaling the mass-to-light ratio of the stars T oc L x with the luminosity. 4 
We recover the fundamental plane in this model when x ~ 0.25. Putting all 
the pieces together, the projected mass inside radius R is 



M(< R) = r*L* 



L(0) 



l+x 



g(R/R e )+g(2) fc ° M m CDM (R/R e ) 
1 — Jcdm 



(B.89) 

where is the mass-to-light ratio of the stars in an L* galaxy, logL(0) = 
log L(z) — e(z) is the luminosity of the lens galaxy evolved to redshift zero 
(where we discuss estimates of the evolution rate e{z) in i iB.QJI . g(x) is the 
fraction of the light inside dimensionless radius x = R/R e (5(1) = 1/2) 
and tticdm{x) is the dimensionless dark matter mass inside radius x with 
wcdm(2) = 1 so that the CDM mass fraction inside x — 2 is fc DM- 



As we discussed earlier in 3B.4.6I few lenses have sufficient constraints to 
estimate all the parameters in such a complex model. However, the assump- 
tion of self-similarity allows the average profile to be constrained statistically 
(Rusin et al. 2003, 2004). Suppose we saw lensed images generated by the 
same galaxy at a range of different source and lens redshifts. Each observed 
lens only reliably measures an aperture mass M ap (R < Reiti) — ' K ^cR 2 Ei n 
where Reiu is the Einstein radius. But the physical scale J?e„ varies with 
redshift, so the ensemble of the lenses traces out the overall mass profile. 
Clearly we do not have ensembles of lenses generated by identical galaxies, 
but the assumption of self-similarity allows us to use the same idea for lenses 
with a range of luminosities and scale lengths. For 22 lenses with redshifts and 
accurate photometry we compared the measured aperture masses to the pre- 
dicted aperture masses (the procedure for two-image lenses is a little more 
complicated, see Rusin et al. !2003) to estimate all the model parameters. 
Fig. IB. 32l shows the results for the parameters associated with the dark mat- 
ter halo. In the limit that Jcdm -> 1 we find that the mass distribution is 
consistent with a simple SIS model (the limit fcDM — * 1 and n — > 2) almost 
independent of the break radius location. There is a slight trend with break 
radius because as the break to the steep p oc r~ 3 outer profile gets closer 

4 They could also have allowed the CDM fraction to vary as /cdm oc L y , but these 
led to degenerate models where only the combination x + y was constrained. 
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to the region with the lensed images the inner cusp can be shallower while 
keeping the overall profile over the region with images close to isothermal. 
As we reduce fcDM and add mass to the stars, the inner cusp becomes shal- 
lower, such that for a NFW (n = 1) cusp the dark matter fraction inside 2R e 
is ~ 40%. It is interesting to note, however, that the total mass distribution 
(light + dark) changes little over the full range of allowed parameters (bottom 
panels of Fig. IB.32|) - lensing constrains the global mass distribution not how 
it is divided into luminous and dark subcomponents. Note the resemblance 
of the statistical results to the results for detailed models of B1933+503 in 

Fig. Em 



B.4.9 Stellar Dynamics and Lensing 

Stellar dynamical analyses of gravitational lenses have reached the level of 
studies of local galaxies approximately 15-20 years ago. The analyses are 
based on the spherical Jeans equations (see Binney & Tremaine |T987) with 
simple models of the orbital anisotropy and generally ignore both deviations 
from sphericity and higher order moments of the velocity distributions. The 
spherical Jeans equation 

v dr r r z 

relates the radial velocity dispersion ay — (v 2 ) 1 ^ 2 , the isotropy parameter 
j3(r) = 1 — erg /a 2 characterizing the ratio of the tangential dispersion to 
the radial dispersion, the luminosity density of the stars v(r) and the mass 
distribution M(r). A well known result from dynamics is that you cannot 
infer the mass distribution M(r) without constraining the isotropy (3{r) (e.g. 
Binney & Mamon [T982J . Models with (3 = are called isotropic models (i.e. 
o> = ere), while models with j3 — ► 1 arc dominated by radial orbits (ag — > 0) 
and models with (3 — > — oo are dominated by tangential orbits (ay — ► 0) . 
These 3D components of the velocity dispersion must then be projected to 
measure the line-of-sight velocity dispersion (vfog) 1 ^ 2 : 



n2 



S(R){vf os )(R) = 2 dzv 



p2 



= 2 I dzuat 



1 - 



r 2 r r 2 ,,, 

(B.91) 

where E(R) = 2 J dzv(r) is the projected surface brightness and z is the 
coordinate along the line of sight. 

Modern observations of local galaxies break the degeneracy between mass 
and isotropy by measuring higher order moments ((v" os )) of the linc-of-sight 
velocity distribution (LOSVD) because the shape of the LOSVD is affected 
by the isotropy of the orbits. Because the velocity dispersions are measured 
starting from a Gaussian fit to the LOSVD, the higher order moments are 
described by the amplitudes h n of a decomposition of the LOSVD into Gauss- 
Hermite polynomials (e.g. van der Marel & Franx rK)93fl . In general, the rms 
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velocity (i.e. combining dispersion and rotation) and higher order moment 
profiles of early-type galaxies are fairly self-similar, with nearly flat rms ve- 
locity profiles, modest values of h 4 ~ 0.01 ± 0.03 and slightly radial orbits 
(f3) ~ 0.1-0.2 (e.g. Romanowsky & Kochanek H533l Gerhard et al.EDQ3J- 

Stellar dynamics is used for two purposes in lensing studies. The first is 
to provide a mass normalization for lens models used in studies of lens statis- 
tics. We will discuss this in i|B.6l The second is to use comparisons between 
a mass estimated from the geometry of a lens and the velocity dispersion 
of the lens galaxy to constrain the mass distribution (e.g. Romanowsky & 
Kochanek fiMfl Trott fc We bster IM)2l Koop mans fc Treu ESM USSl Treu 
& Koopmans I2002al 12002b! Koopmans et al. I2003[) . It is important to un- 
derstand that the systematic uncertainties in combining lensing and stellar 
dynamics to determine mass distributions are different from using either in 
isolation. For local galaxies we measure a velocity dispersion profile. The nor- 
malization of the profile sets the mass scale and the changes in the profile 
(and any higher order moments) with radius constrains the mass distribu- 
tion. To lowest order, a simple scaling error in the velocity measurements 
will lead to errors in the mass scale rather than in the mass distribution. 
For lens galaxies, it is the comparison between the velocity dispersion and 
the mass determined by the geometry of the images that constrains the mass 
distribution. Thus, estimates of the mass distribution are directly affected by 
any calibration errors in the velocity dispersions. 

We can understand the differences with a simple thought experiment. Sup- 
pose we have a mass distribution M = Mq(R/ Ro) x in projection and we have 
mass estimates Mi at i?i and Mi at i?2- Combining them we can solve for 
the exponent describing the mass distribution, x = ln(Mi / M2) / \n(Ri / R2) . 
In a dynamical observation the mass estimate is some sort of virial estimator 
M oc a 2 R/G while in a lensing measurement it is a direct measurement of M. 
Standard velocity dispersion measurements start from the best fit Gaussian 
line width a and then subtract an intrinsic line width a c due to the instrument 
and the intrinsic line width of the star in quadrature to estimate the portion 
of the line width due to the motions of the stars. Thus a 2 — f 2 (cr 2 — <r 2 ) 
where / ~ 1 is a scale factor to account for deviations from spherical sym- 
metry and non-Gaussian line of sight velocity distributions (LOSVDs). In 
a purely dynamical study, uncertainties in / and a c produce bigger frac- 
tional errors in the absolute mass scale Mq than in the exponent x. For 
example, given measurements a\ and 02 at radii R\ and R2, the exponent, 
x = 1 + ln(<r 2 /a 2 )/ ln(i?i/i?2), depends only on velocity dispersion ratios in 
which calibration errors tend to cancel. This is obvious for the scale factor 
/, which cancels exactly if it does not vary with radius. Since studies of lens 
dynamics use a comparison between a dynamical mass and a lensing mass to 
estimate the mass distribution, the results are more sensitive to calibration 
problems because these cancellations no longer occur. If we combine a ve- 
locity dispersion measurement a\ with a lensing mass measurement Mi our 



74 C.S. Kochanek 



estimate of the exponent becomes x — \n{a\R\/GM^/\a.{R\/R2) and the 
uncertainties are linear in the scale factor / rather than canceling. An error 
analysis for the effects of a c is messier, but you again find that the sensitivity 
in the mixed lensing and dynamics constraint to errors in a c is greater than 
in a purely dynamical study. 

Velocity dispersions have now been measured for 10 lenses (0047-2808 
Koopmans fc Tr eu 12777131 CFRS 03.107 7 Treu & Koopman s 12777141 Q0957+561 
Falco et al . 113371 Tonry & Franx lT333l PG1115+080 Tonry|l998 ); HST14176+5226 
Ohyama et al. 123321 Gebhardt et al. 12777731 Treu & KooDmans l2334l HST15433+5352 
Treu & Ko opman s l2334l MG1549+3047 Lehar et al . 113331 B1608+656 Koop- 
mans et al. 123331 MG2016+112 Koopmans & Treu 123321 Q2237+0305 Foltz 
et al. I1992fl . With the exception of Romanowsky & Kochanek f]jJ53|, who 
fitted for the distribution function of the lens, the analyses of the data have 
used the spherical Jeans equations with parameterized models for the isotropy 
P(r). They include the uncertainties in a c about as well as any other dynam- 
ical study, although it is worth bearing in mind that this is tricky because we 
lack nearby stars with the appropriate metallicity and the problem of match- 
ing the spectral resolution for the galaxy and the template stars lacks direct 
checks of the success of the procedure. A useful rule of thumb to remember is 
that repeat measurements of velocity dispersions by different groups almost 
always show larger scatter than is consistent with the reported uncertain- 
ties. For example, the three velocity dispersion measurements for the lens 
HST14176+5226 (224 ± 15 km/s by Ohyama et al. 123321 202 ± 9 km/s by 
Gebhardt et al. 123331 and 230 ± 14 km/s by Treu & Koopmans 12334) are 
mutually consistent only if the uncertainties are broadened by 30%. 

In Fig. IB. 331 we summarize the dynamical constraints for 9 of these sys- 
tems using the self-similar mass distribution from Rusin & Kochanek (2004 
Ean. lB.89|) . This model is very similar to that used by Treu & Koopmans ( 2004 ) . 
For most of the lenses, the region producing a good fit to the combined lens- 
ing and dynamical data overlaps the same region preferred by the Rusin & 
Kochanek ( 2004) self-similar models, shows the same general parameter de- 
generacy and is consistent with a simple SIS mass distribution (f c dm —> 1 and 
n = 2). This is particularly true of 0047-2808, HST15433+5352, B1608+656, 
MG2016+112 and CFRS03.1077. Only Q2237+0305, where the lens is the 
bulge of a nearby spiral and we might not expect this mass model to be 
applicable, shows a very different trend (e.g. see the models of Trott & 
Webster 12332)1 . PG1115+080 and to a lesser extent MG1549+3047 might 
have steeper than isothermal mass distributions (falling rotation curves) 
and the possibility of being consistent with a constant mass-to-light ratio 
model (Treu & Koopmans l2002a|) . HST14176+5226 and to a lesser extent 
HST15433+5352 could have shallower than isothermal mass distribution (ris- 
ing rotation curves) . Along the degeneracy direction for each lens we will find 
similar mass distributions with very different decompositions into luminous 
and dark matter, just as in Fig. IB. 321 The problem raised by this panorama 
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Fig. B.33. Constraints from lens velocity dispersion measurements on the self- 
similar mass distributions of Eqn. IB.891 and Fig. IB.321 The dotted contours show 
the 68% and 95% confidence limits from the self-similar models for Rb/R e = 50. 
The shaded regions show the models allowed (68% confidence) by the formal ve- 
locity dispersion measurement errors, and the heavy solid lines show contours of 
the velocity dispersion in km/s. We used the low Gebhardt et al. H2003H veloc- 
ity dispersion for HST14176+5226 because it has the smallest formal error. These 
models assumed isotropic orbits, thereby underestimating the full uncertainties in 
the stellar dynamical models. 
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is whether it shows that the halo structure is largely homogeneous with some 
measurement outliers, or that the structure of early-type is heterogeneous 
with important implications for understanding time delays f ^B.5j) and galaxy 
evolution 

My own view tends toward the first interpretation - that the dynamical 
data supports the homogeneity of early-type galaxy structure. The permitted 
bands in Fig. IB. 331 show the 68% confidence regions given the formal mea- 
surement errors and the simple, spherical, isotropic Jeans equation models 
- this means that the true 68% confidence regions are significantly larger. 
We have already argued that the formal errors on dynamical measurements 
tend to be underestimates. For example, the need for HST14176+5226 to 
have a rising rotation curve would be considerably reduced if we used the 
higher velocity dispersion measurements from Ohyama et al. 1)20021) or Treu 
& Koopmans I2004[) or if we broadened the uncertainties by the 30% needed 
to make the three estimates statistically consistent. Moreover, the existing 
analyses have also neglected the systematic uncertainties arising from the 
scaling factor /. There are two important issues that make / ^ 1. The first 
issue is that standard velocity dispersion measurements are the width of the 
best fit Gaussian model for the LOSVD, and this is not the same as the mean 
square velocity ((^f os ) 1/ ' 2 ) appearing in the Jeans equations used to analyze 
the data unless the LOSVD is also a Gaussian. Stellar dynamics has adopted 
the dimensionless coefficients h n of a Gauss-Hermite polynomial series to 
model the deviations of the LOSVD from Gaussian, and a typical early-type 
galaxy has \hi\ & 0.03 (e.g. Romanowsky & Kochanek ll999f) . This leads to a 
systematic difference between the measured dispersions and the mean square 
velocity of (vf os ) 1/2 ~ a(l + y/6h 4 ) (e.g. van der Marel & Franx IBMjl L so 
~ 7% for |/i 4 | ~ 0.03. Only the Romanowsky & Kochanek lT^jl mod- 
els of Q0957+561 and PG1115+080 have properly included this uncertainty. 
In fact, Romanowsky & Kochanek ( 1999) demonstrated that there were stel- 
lar distribution functions in which the mass distribution of PG1115+080 is 
both isothermal and agrees with the measured velocity dispersion. While it 
is debatable whether these models allowed too much freedom, it is certainly 
true that models using the Jeans equations and ignoring the LOSVD have 
too little freedom and will overestimate the constraints. 

The second issue is that lens galaxies are not spheres. Unfortunately there 
are few simple analytic results for oblate or triaxial systems like early-type 
galaxies in which the ellipticity is largely due to anisotropics in the velocity 
dispersion tensor rather than rotation. For the system as a whole, the tensor 
virial theorem provides a simple global relationship between the major and 
minor axis velocity dispersions 

^^1 + I e 2 + A e 4 + ... (B.92) 

@ minor & ' U 

for an oblate ellipsoid of axis ratio q and eccentricity e = (1 — q 2 ) 1 / 2 (e.g. 
Binney & Tremaine I1987f) . The velocity dispersion viewed along the major 
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axis is larger than that on the minor, and the correction can be quite large 
since a typical galaxy with q = 0.7 will have a ratio cr rna j or /a minor ~ 1.16 
that is much larger than typical measurement uncertainties. If galaxies are 
oblate, this provides no help for the case of PG1115+080 because making 
the line-of-sight dispersion too high requires a prolate galaxy. However, it is 
a very simple means of shifting HST14176+5226. Crudely, if we start with 
the low 209 km/s velocity dispersion and assume that the lens is an q — 0.7 
galaxy viewed pole on, then a - maj or /a 'minor — 1.14 and the corrections for the 
shape are large enough to make HST14176+5226 consistent with the other 
systems. 

A final caveat is that neglecting necessary degrees of freedom in your 
lens model can bias inferences from the stellar dynamics of lenses just as 
it can in pure lens modeling. For example, Sand et al. {2002, 2003J used a 
comparison of lensed arcs in clusters to velocity dispersion measurements of 
the central cluster galaxy to argue that the cluster dark matter distribution 
could not have the p cx 1/r cusp of the NFW model for CDM halos. However, 
Bartelmann & Meneghetti (2003) and Dalai & Keeton l|2003[l show that the 
data are consistent with an NFW cusp if the lens models include a proper 
treatment of the non-spherical nature of the clusters. This has not been an 
issue in the stellar dynamics of strong lenses where the lens models used to 
determine the mass scale have always included the effects of ellipticity and 
shear, but it is well worth remembering. 

B.5 Time Delays 

Nothing compares to the measurement of the Hubble constant in bringing 
out the worst in astronomers. As we discussed in the previous section on 
lens modeling, many discussions of lens models seem obfuscatory rather than 
illuminating, and the tendency in this direction increases when the models 
are used to estimate Ho. In this section we discuss the relationship between 
time delay measurements, lens models and Hq. All results in the literature 
are consistent with this discussion, although it may take you several days 
and a series of e-mails to confirm it for any particular paper. The basic idea 
is simple. We see images at extrema of the virtual time delay surface (e.g. 
Blandford & Narayan 1986;, Part 1) so the propagation time from the source 
to the observer differs for each image. The differences in propagation times, 
known as time delays, are proportional to H^ 1 because the distances be- 
tween the observer, the lens and the source depend on the Hubble constant 
(Refsdal 1964). When the source varies, the variations appear in the images 
separated by the time delays and the delays are measured by cross-correlating 
the light curves. There are recent reviews of time delays and the Hubble con- 
stant by Courbin, Saha & Schechter ( 2002} and Kochanek & Schechter (2004 ). 
Portions of this section are adapted from Kochanek & Schechter {2004 ) since 
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we were completing that review at about the same time as we presented these 
lectures. 

To begin the discussion we start with our standard simple model, the 
circular power law lens from 3B.3I As a circular lens, we see two images 
at radii 9 a and 9 b from the lens center and we will assume that 9a > 9b 
Image A is a minimum, so source variability will appear in image 



(Fig.|B20J 

A first and then with a time delay At in the saddle point image B. We can 
easily fit this data with an SIS lens model since (see Eon. IB.2l"l and lB.22jl to 
find that A = [3 + b and Ob = b — f3 where b — (9a + 9b)/2 is the critical 
(Einstein) radius of the lens and = (9a — 9b)/2 is the source position. The 
light travel time for each image relative to a fiducial unperturbed ray is (see 
Part 1) 



t(0) 



D d Ds 

cDda 



(B.93) 



where the effective potential & = b9 for the SIS lens. Remember that the 
distances are comoving angular diameter distances rather than the more fa- 
miliar angular diameter distances and this leads to the vanishing of the extra 
1 + zi factor that appears in the numerator if you insist on using angular di- 
ameter distances. The propagation time scales as Hq 1 ~ 10ft, -1 Gyr because 
of the Hq 1 scalings of the distances. After substituting our lens model, and 
differencing the delays for the two images, we find that 
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The typical deflection angle b ~ 3 x 10 -6 radians (so R A ~ 10 -11 ) converts 
the 10ft _1 Gyr propagation time into a time delay of months or years that 
can be measured by a graduate student. Naively, this result suggests that 
the problem of interpreting time delays to measure Hq is a trivial problem in 
astrometry. 

We can check this assumption by using our general power-law models 
from HB.3I instead of an SIS. The power-law models correspond to density 
distributions p oc r — n , surface densities k oc R n and circular velocities 
v c oc r( 2 ~™)/ 2 of which the SIS model is the special case with n — 2. These 
models have effective potentials 
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As we discussed in the HB.4.11 we can fit our simple, circular two-image lens 

which we 



with any of these models to determine b(n) and (3(n) (Eqn. IB.66|) 
can then substitute into the expression for the propagation time to find that 
the time delay between the images is 
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where we have expanded the result as a series in the ratio between the mean 
radius of the images (9) — (9a+9b)/2 and the thickness of the radial annulus 
separating them 69 = 9a — Ob- While the expansion assumes that 59/(9) ~ 
(3/b is small, we can usually ignore higher order terms even when 59/ (9) is of 
order unity. We now see that the time delay depends critically on the density 
profile, with more centrally concentrated mass distributions (larger values of 
n) producing longer time delays or implying larger Hubble constants for a 
fixed time delay. 

The other idealization of the SIS model, the assumption of a circular 
lens, turns out to be less critical. A very nice analytic example is to con- 
sider a singular isothermal model with arbitrary angular structure in which 
K = bF(x)/26 where F (x) is an arbitrary function of the azimuthal angle. 
The singular isothermal ellipsoid fEan. lB~37jl is an example of this class of po- 
tential. For this model family, At — Atsis independent of the actual angular 
structure F( X ) (Witt, Mao & Keeton IMtH)) . 

B.5.1 A General Theory of Time Delays 

Just as for estimating mass distributions f ^B.4ll . the aspect of modeling time 
delays that creates the greatest suspicion is the need to model the gravita- 
tional potential of the lens. Just as for mass distributions, this problem is 
largely of our own making, arising from poor communication, understand- 
ing and competition between groups. Here we will use simple mathematical 
expansions to show exactly what properties of the potential determine time 
delays. Any models which have these generic properties have all the degrees 
of freedom needed to properly interpret time delays. This does not, unfortu- 
nately, avoid the problem of degeneracies between the mass models and the 
Hubble constant. 

The key to understanding time delays comes from Gorenstein, Falco & 
Shapiro ( 1988, Kochanek 2002a, see also Saha 2000) who showed that the time 
delay in a circular lens depends only on the image positions and the surface 
density k(9) in the annulus between the images. The two lensed images at 
radii 9a > 9b define an annulus bounded by their radii, with an interior 
region for 9 < 9b and an exterior region for 9 > 9a (Fig. IB.20|) . As we 
discussed in ^IB.4.11 the mass in the interior region is implicit in the image 
positions and constrained by the astrometry. From Gauss' law we know that 
the distribution of the mass in the interior and the amount or distribution of 
mass in the exterior region is irrelevant (see 9B.4.3j) . A useful approximation is 
to assume that the surface density in the annulus can be locally approximated 
by a power law, n(9) cx 9 1 ~ n for 9b < 9 < 9a, with a mean surface density 
in the annulus of (n) = (S)/S c . The time delay between the images is then 
(Kochanek l2002ajl 




(B.97) 
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Fig. B.34. (Top) The PG1115+080 time delays scaled by the astrometric factor 
# 2 — 9j appearing in At sis (Eqn. lrT.94ll as a function of the leading angular depen- 
dence of the time delay (sin 2 Axij/2) CEan. |B",98L The light solid curve and the 
dashed curves show the dependence for the best fit internal shear fraction /;„t and 
its 68% confidence limits. A true external shear /j„t = is shown by the heavy 
solid curve inside the confidence limits, and the scaling for an SIE (fint = 1/4) is 
shown by the horizontal line. (Bottom) The \ 2 goodness of fit for the internal shear 
fraction fi„t from the time delay ratios is shown by the curve with the 68% confi- 
dence region bracketed by the vertical lines. The estimate of fi n t from the image 
astrometry is shown by the point with an error bar. 
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where (9) = (9 a + 9b)/2 and 86 = 6a — 9b as before. Thus, the time delay is 
largely determined by the average surface density (k) in the annulus with only 
modest corrections from the local shape of the surface density distribution 
even when 89/(9) ~ 1. This second order expansion is exact for an SIS lens 
((k) = 1/2, n = 2), and it reproduces the time delay of a point mass lens 
((k) = 0) to better than 1% even when 89/(9) = 1. The local model also 
explains the scalings of the global power-law models. A k oc 9 l ~ n global 
power law has surface density (n) = (3 — n)/2 near the Einstein ring, so the 
leading term of the time delay is At = 2Asis(l — (k)) = (n — 1) At sis just 
as m Eqn. lB~96l 

The role of the angular structure of the lens is easily incorporated into 
the expansion through the multipole expansion of £IB.4I A quadrupole term 
in the potential with dimensionless amplitude produces ray deflections of 
order 0(e&b) at the Einstein radius b of the lens. In a four-image lens, the 
quadrupole deflections are comparable to the fractional thickness of the annu- 
lus, e& ~ 89/(9), while in a two- image lens they are smaller. For an ellipsoidal 
density distribution, the cos(2mx) multipole amplitude is smaller than the 
quadrupole amplitude by e 2m ~ <J (89/(9)) m . Hence, to lowest order in 
the expansion we only need to include the internal and external quadrupoles 
of the potential but not the changes of the quadrupoles in the annulus or any 
higher order multipoles. Remember that what counts is the angular structure 
of the potential rather than of the density, and that potentials are always 
much rounder than densities with a typical scaling of m _2 :TO _1 :l between 
the potential, deflections and surface density for the cos m\ multipoles (see 



While the full expansion for independent internal and external quadrupoles 
is too complex to be informative, the leading term for the case when the inter- 
nal and external quadrupoles are aligned is informative. We have an internal 
shear of amplitude F and an external shear of amplitude 7 with % 7 = \r as 
defined in Eans. lB~5l1 and lB.52l The leading term of the time delay is 

At * 2At SIS (1 - <«» : :f[y ( f ] m (B-98) 
1 - 4/ mt cos (Axab/2) 



where A\ab is the angle between the images (Fig. IB.20|) and f int — T/{T + 
7) is the internal quadrupole fraction we explored in Fig. IB. 311 We need 
not worry about a singular denominator - successful models of the image 
positions do not allow such configurations. 

A two-image lens has too few astrometric constraints to fully constrain 
a model with independent, misaligned internal and external quadrupoles. 
Fortunately, when the lensed images lie on opposite sides of the lens galaxy 
(A\ab — 7T + 8 with 1 8 1 <C 1), the time delay becomes insensitive to the 
quadrupole structure. Provided the angular deflections are smaller than the 
radial deflections (|<5|(0) 89), the leading term of the time delay reduces 
to the result for a circular lens, At = 2Atsis(l — (k)) if we minimize the 
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total shear of the lens. In the minimum shear solution the shear converges to 
the invariant shear (71) and the other shear component 72 = (see ffl3. 4.5(1 . 
If, however, you allow the other shear component to be non-zero, then you 
find that At — 2Atsis(l — ( K ) — 12) to lowest order - the second shear 
component acts like a contribution to the convergence. In the absence of 
any other constraints, this adds a modest additional uncertainty (5-10%) to 
interpretations of time delays in two-image lenses. To first order its effects 
should average out in an ensemble of lenses because the extra shear has no 
preferred sign. 

A four image lens has more astrometric constraints and can constrain 
a model with independent, misaligned internal and external quadrupoles 
- this was the basis of the Turner et al. (2004) summary of the internal 
to total quadrupole ratios shown in Fig. IB. 311 If the external shear dom- 
inates, then fi n t ~ and the leading term of the delay becomes At = 
2At S is{l - (k)) sin 2 Axab/2- If the model is isothermal, like the ^ = 0F(x) 
model we introduced in Eqn. IB. 421 then fi nt — 1/4 and we obtain the 
Witt et al. IpOOO) result that the time delay is independent of the angle 
between the images At ~ 2Atsis(^ ~ ( K ))- Thus, delay ratios in a four-image 
lens are largely determined by the angular structure and provide a check 
on the potential model. Unfortunately, the only lens with precisely mea- 
sured delay ratios, B1608+656 (Fassnacht et al. 2002:), also has two galax- 
ies inside the Einstein ring and is a poor candidate for a simple multipolc 
treatment (although it is dominated by an internal quadrupole as expected, 
see Fig. IB.31|I . The delay ratios for PG1115+080 are less well measured 
(Schechter et al. Barkana Chartas HEIII) , but should be domi- 

nated by external shear since the estimate from the image astrometry is that 
f mt = 0.083 (0.055 < f int < 0.111 at 95% confidence). Fig. lB~3l shows the 
dependence of the PG1115+080 delays on the leading angular dependence of 
the time delay (Eqn. IB.98|I after scaling out the standard astrometry factor 
for the different radii of the images (Eqn. IB. 94(1 . Formally, the estimate from 
the time delays that fi nt — —0.02 (—0.09 < fi nt < 0.03 at 68% confidence) is 
a little discrepant, but the two estimates agree at the 95% confidence level and 
there are still some systematic uncertainties in the shorter optical delays of 
PG1115+080. Changes in fi nt between lenses is the reason Saha (2004) found 
significant scatter between time delays scaled only by AtsiSi since the time 
delay lenses range from external shear dominated systems like PG1115+080 
to internal shear dominated systems like B1608+656. 

B.5.2 Time Delay Lenses in Groups or Clusters 

Most galaxies are not isolated, and many early-type lens galaxies are mem- 
bers of groups or clusters, so we need to consider the effects of the local 
environment on the time delays. Weak perturbations are easily understood 
since they will simply be additional contributions to the surface density (k c ) 
and the external shear/quadrupole (j c ) we discussed in flB.41 In general the 
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effects of the external shear j c are minimal because they either have little 
effect on the delays (two-image lenses) or are tightly constrained by either 
the astrometry or delay ratios (four-image lenses or systems with lensed host 
galaxies see 3B.10J) . The problems arise from either the degeneracies asso- 
ciated with the surface density n c or the need for a complete, complicated 
cluster model. 

The problem with n c is the infamous mass-sheet degeneracy (Part 1, Falco, 
Gorenstein & Shapiro Il98fi[) . If we have a model predicting a time delay 
Ato and add a sheet of constant surface density k c , then the time delay is 
changed to (1 — K c )Ato without changing the image positions, flux ratios, 
or time delay ratios. Its effects can be understood from ijB.5.11 as a contri- 
bution to the annular surface density with (n) = k c and rj — 1. Its only 
observable effect other than that on the time delays is a reduction in the 
mass of the lens galaxy that could be detected given an independent esti- 
mate of the lens galaxy's mass such as a velocity dispersion (e.g. 3B.4.9I see 
Romanowsky & Kochanek 1998 for an attempt to to this for Q0957+561). 
It can also be done given an independent estimate of the properties of the 
group or cluster using weak lensing (e.g. Fischer et al. 119971 in Q0957+561), 
cluster galaxy velocity dispersions (e.g., Angonin-Willaime, Soucail, & Van- 
derriestCnnifor Q0957+561, Hjorth et a,l. 1571(151 for RX J0911+0551) or X-ray 
temperatures/luminosities (e.g., Morgan et al. 120011 for RXJ0911+0551 or 
Chartas et al. 121)021 for Q0957+561). The accuracy of these methods is un- 
certain at present because each suffers from its own systematic uncertainties, 
and they probably cannot supply the 10% or higher precision measurements 
of k c needed to strongly constrain models. When the convergence is due to 
an object like a cluster, there is a strong correlation between the conver- 
gence n c and the shear 7 C that is controlled by the density distribution of the 
cluster (for an isothermal model k c — 7 C ). When the lens is in the outskirts 
of a cluster, as in RXJ0911+0551, it is probably reasonable to assume that 
K c < 7c as most mass distributions are more centrally concentrated than 
isothermal (see Eon. IB.Sl . Neglecting the extra surface density coming from 
nearby objects (galaxies, groups, clusters) leads to an overestimate of the 
Hubble constant, because these objects all have n c > 0. For most time delay 
systems this correction is probably £10%. 

If the cluster or any member galaxies are sufficiently close, then we cannot 
ignore the higher-order perturbations in the expansion of Eqn. I|B.26(I . This is 
certainly true for Q0957+561 (as discussed in 3B. 4.6(1 where the lens galaxy is 
the brightest cluster galaxy and located very close to the center of the cluster. 
It is easy to gauge when they become important by simply comparing the 
deflections produced by any higher order moments of the cluster beyond the 
quadrupole with the uncertainties being used for the image positions. For a 
cluster of critical radius b c at distance 6 C from a lens of Einstein radius b, these 
perturbations are of order b c (b/8 c ) 2 ~ bj c (b/9 c ). Because the astrometric 
precision of the measurements is so high, these higher order terms can be 
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Table B.l. Time Delay Measurements 



System 


AT 


At (days) Astrometry 


Model 


Kef. 


HE1 104-1805 


2 


161 + 7 


+ 


"simple" 


1 


PG1115+080 


4 


25 + 2 


+ 


"simple" 


2 


SBS1520+530 


2 


130 + 3 


+ 


"simple" 


3 


B 1600+434 


2 


51 + 2 


+/- 


"simple" 


4 


HE2 149-2745 


2 


103 ± 12 


+ 


"simple" 


5 


RXJ0911+0551 


4 


146 + 4 


+ 


cluster / satellite 


6 


Q095 7+561 


2 


417 + 3 


+ 


cluster 


7 


B1608+656 


4 


77 + 2 


+/- 


satellite 


8 


B0218+357 


2 


10.5 ± 0.2 




"simple" 


9 


PKS1830-211 


2 


26 + 4 




"simple" 


10 


B1422+231 


4 


(8 + 3) 


+ 


"simple" 


11 



Nim is the number of images. At is the longest of the measured delays and its la 
error; delays in parenthesis require further confirmation. The "Astrometry" column 
indicates the quality of the astrometric data for the system: + (good), +/— (some 
problems), — (serious problems). The "Model" column indicates the type of model 
needed to interpret the delays. "Simple" lenses can be modeled as a single primary 
lens galaxy in a perturbing tidal field. More complex models are needed if there is 
a satellite galaxy inside the Einstein ring ("satellite") of the primary lens galaxy, 
or if the primary lens belongs to a cluster. References: (1) Ofek & Maoz 2003 
Wyrzykows ki et al . 1277031 (2) Barkana HWl based on Schechter et al. IT09T1 (3) 
Burud et al. I2002bl (4) Burud et al. 1277001 also Koopmans et al. 1277001 (5) Burud et 
al. liW02al (6) Hjorth et al. 120021 (7) Kundic et al. HWl also Schild & Thomson HWl 
and Haarsma et al. tM&l (8) Fassnacht et al. 1277021 (9) Biggs et aL flMfl also Cohen 
et al. 1277001 (10) Lovell et al. tTOOSI (11) Patnaik & Narasimha l2770Tl 

relatively easy to detect. For example, models of PG1115+080 (e.g. Impey 
et al. I1998[) find that using a group potential near the optical centroid of the 
nearby galaxies produces a better fit than simply using an external shear. In 
this case the higher order terms are fairly small and affect the results little, 
but results become very misleading if they are important but ignored. 

B.5.3 Observing Time Delays and Time Delay Lenses 

The first time delay measurement, for the gravitational lens Q0957+561, 
was reported in 1984 (Florentin-Niclscn 1984). Unfortunately, a controversy 
then developed between a short delay (~ 1.1 years, Schild & Cholfin 1986 
Vanderriest et al. 1983 and a long delay (~ 1.5 years, Press, Rybicki, & 
Hewitt ,1992a 1992b), which was finally settled in favor of the short delay only 
after 5 more years of effort (Kundic et al. 119971 also Schild & Thomson 1997 
and Haarsma et al. I1999|l . Factors contributing to the intervening difficulties 
included the small amplitude of the variations, systematic effects, which, with 
hindsight, appear to be due to microlensing and scheduling difficulties (both 
technical and sociological). 
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Fig. B.35. VLA monitoring data for the four-image lens B1608+656. The top 
panel shows (from top to bottom) the normalized light curves for the B (filled 
squares), A (open diamonds), C (filled triangles) and D (open circles) images as a 
function of the mean Julian day. The bottom panel shows the composite light curve 
for the first monitoring season after cross correlating the light curves to determine 
the time delays (At A B = 31.5±1.5, At CB = 36.0±L5 and At DB = 77.0±1.5 days) 
and the flux ratios. (From Fassnacht et al. 2002 ) 
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While the long-running controversy over Q0957+561 led to poor publicity 
for the measurement of time delays, it allowed the community to come to an 
understanding of the systematic problems in measuring time delays, and to 
develop a broad range of methods for reliably determining time delays from 
typical data. Only the sociological problem of conducting large monitoring 
projects remains as an impediment to the measurement of time delays in 
large numbers. Even these are slowly being overcome, with the result that 
the last five years have seen the publication of time delays in 11 systems (see 
Table lB~5~2)> . 

The basic procedures for measuring a time delay are simple. A monitoring 
campaign must produce light curves for the individual lensed images that are 
well sampled compared to the time delays. During this period, the source 
quasar in the lens must have measurable brightness fluctuations on time 
scales shorter than the monitoring period. The resulting light curves are 
cross correlated by one or more methods to measure the delays and their 
uncertainties (e.g., Press et al. I1992all992bl Beskin & Oknvanskii Pelt 
et al. 119961 references in Table 1.1). Care must be taken because there can 
be sources of uncorrelated variability between the images due to systematic 
errors in the photometry and real effects such as microlensing of the individual 
images (e.g., Koopmans et al. HdM Burud et al l2002bl Schechter et al.|2003). 
Figure IB. 351 shows an example, the beautiful light curves from the radio 
lens B1608+656 by Fassnacht et al. I|2(JU2|I . where the variations of all four 
lensed images have been traced for over three years. One of the 11 systems, 
B1422+231, is limited by systematic uncertainties in the delay measurements. 

We want to have uncertainties in the time delay measurements that are 
unimportant for the estimates of Hq. For the present, uncertainties of order 
3%-5% are adequate (so improved delays are still needed for PG1115+080, 
HE2149-2745, and PKS1830-211). In a four-image lens we can measure three 
independent time delays, and the dimensionless ratios of these delays pro- 
vide additional constraints on the lens models (see ^B.5.1|l . These ratios are 
well measured in B1608+656 (Fassnacht et al. I2()()2|l . poorly measured in 
PG1115+080 (BarkanaHSni Schechter et al. ITM71 Chartas EDOU and un- 
measured in either RXJ0911+0551 or B1422+231. Using the time delay lenses 
as very precise probes of Hq , dark matter and cosmology will eventually re- 
quire still smaller delay uncertainties (~ 1%). Once a delay is known to 5%, 
it is relatively easy to reduce the uncertainties further because we can accu- 
rately predict when flux variations will appear in the other images and the 
lens will need to be monitored more intensively. 

The expression for the time delay in an SIS lens (Eqn. IB.94fl reveals the 
other data that are necessary to interpret time delays. First, the source and 
lens redshifts are needed to compute the distance factors that set the scale 
of the time delays. Fortunately, we know both redshifts for all 11 systems in 
Table |B~5. 21 even though missing redshifts are a problem for the lens sample 
as a whole (see ^1B.2|) . The dependence of the distances Dd, D s and Dd s on 
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the cosmological model is unimportant until our total uncertainties approach 
5%. Second, we require accurate relative positions for the images and the lens 
galaxy. These uncertainties are always dominated by the position of the lens 
galaxy relative to the images. For most of the lenses in Table |R 5. 21 observa- 
tions with radio interferometers (VLA, Merlin, VLBA) and HST have mea- 
sured the relative positions of the images and lenses to accuracies <J 0'.'005. 
Sufficiently deep HST images can obtain the necessary data for almost any 
lens, but dust in the lens galaxy (as seen in B1600+434 and B1608+656) 
can limit the accuracy of the measurement even in a very deep image. For 
B0218+357 and PKS1830-211, however, the position of the lens galaxy rel- 
ative to the images is not known to sufficient precision or determined only 
from model s (see B iggs et al. \MM Lehar et al. I2HUUI Courbin et al. [2002; 
Winn et al. I2002bl Wucknitz, Biggs & Browne I2TM1 York et al. lMHjl . 

We can also divide the systems by the complexity of the required lens 
model. We define eight of the lenses as "simple," in the sense that the avail- 
able data suggests that a model consisting of a single primary lens in a 
perturbing shear (tidal gravity) field should be an adequate representation 
of the gravitational potential. In some of these cases, an external potential 
representing a nearby galaxy or parent group will improve the fits, but the 
differences between the tidal model and the more complicated perturbing 
potential are small (see 3B.5.2jl . If we neglect the convergence produced by 
the group, then H may be overestimated. We include the quotation marks 
because the classification is based on an impression of the systems from the 
available data and models. Remember also that there are convergence fluc- 
tuations along the line of sight that add a low level of cosmic variance to 
the time delays of individual lenses f ilB.4.2l and Fig. IB.lQjl . While we cannot 
guarantee that a system is simple, we can easily recognize two complications 
that will require more complex models. 

The first complication is that some primary lenses have less massive satel- 
lite galaxies inside or near their Einstein rings. This includes two of the time 
delay lenses, RXJ0911+0551 and B1608+656. RXJ0911+0551 could simply 
be a projection effect, since neither lens galaxy shows irregular isophotes. 
Here the implication for models may simply be the need to include all the 
parameters (mass, position, ellipticity • • •) required to describe the second lens 
galaxy, and with more parameters we would expect greater uncertainties in 
Ho. In B1608+656, however, the lens galaxies show the disturbed isophotes of 
dusty galaxies possibly undergoing a disruptive interaction. How one should 
model such a system is unclear. If there was once dark matter associated with 
each of the galaxies, how is it distributed now? Is it still associated with the 
individual galaxies? Has it settled into an equilibrium configuration? While 
B1608+656 can be well fit with standard lens models (Fassnacht et al.|2002 
Koopmans et al. l2003[) . these complications have yet to be explored in detail. 

The second complication occurs when the primary lens is a member of 
a more massive (X-ray) cluster, as in the time delay lenses RXJ091 1+0551 
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(Morgan et al. EUUT)) and Q0957+561 (Chartas et al. l2UTJ2|) . The cluster model 
is critical to interpreting these systems (see flB. 5.2(1 . The cluster surface den- 
sity at the position of the lens (k c J> 0.2) leads to large corrections to the 
time delay estimates and the higher-order perturbations are crucial to ob- 
taining a good model. For example, models in which the Q0957+561 cluster 
was treated simply as an external shear were grossly incorrect (see 
Keeton et al. 2000). In addition to the uncertainties in the cluster model it- 
self, we must also decide how to include and model the other cluster galaxies 
near the primary lens. Thus, lenses in clusters have many reasonable degrees 
of freedom beyond those of the "simple" lenses. 

B.5.4 Results: The Hubble Constant and Dark Matter 

With our understanding of the theory and observations of the lenses we 
will now explore their implications for Hq. We focus on the "simple" lenses 
PG1115+080, SBS1520+530, B1600+434, and HE2149-2745. We only com- 
ment on the interpretation of the HE 1104-1805 delay because the measure- 
ment is too recent to have been interpreted carefully. We will briefly discuss 
the more complicated systems B0218+357, RXJ0911+0551, Q0957+561, and 
B1608+656, and we will not discuss the systems with problematic time delays 
or astrometry. 

The most common, simple, realistic model of a lens consists of a singular 
isothermal ellipsoid (SIE) in an external (tidal) shear field (see £jB.4(l . The 
model has 7 parameters (the lens position, mass, ellipticity, major axis ori- 
entation for the SIE, and the shear amplitude and orientation). It has many 
degrees of freedom associated with the angular structure of the potential, 
but the radial structure is fixed with (k) ~ 1/2. For comparison, a two-image 
(four- image) lens supplies 5 (13) constraints on any model of the potential: 
2 (6) from the relative positions of the images, 1 (3) from the flux ratios 
of the images, (2) from the inter-image time delay ratios, and 2 from the 
lens position. With the addition of extra components (satellites/clusters) for 
the more complex lenses, this basic model provides a good fit to all the time 
delay lenses except Q0957+561. Although a naive counting of the degrees 
of freedom (Ndof — —2 and 6, respectively) suggests that estimates of H 
would be under constrained for two-image lenses and over constrained for 
four-image lenses, the uncertainties are actually dominated by those of the 
time delay measurements and the astrometry in both cases. This is what we 
expect from 3B.5.1I — the model has no degrees of freedom that change (k) 
or 77, so there will be little contribution to the uncertainties in Hq from the 
model for the potential. 

If we use a model that includes parameters to control the radial density 
profile (i.e., («)), for example by adding a halo truncation radius a to the SIS 
profile [the pseudo-Jaffe model, p cx r~ 2 (r 2 + a 2 )^ 1 ; e.g., Impey et al. 1998; 
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Burud et al. 2002a], 5 the n we find the expected correlation between a and 
Hq — as we make the halo more concentrated (smaller a), the estimate of Hq 
rises from the value for the SIS profile ((re) = 1/2 as a — > oo) to the value for 
a point mass ((re) = as a — > 0), with the fastest changes occurring when a is 
similar to the Einstein radius of the lens. We show an example of such a model 
for PG1115+080 in Figure IB. 361 This case is somewhat more complicated 
than a pure pseudo-Jaffe model because there is an additional contribution 
to the surface density from the group to which the lens galaxy belongs. As 
long as the structure of the radial density profile is fixed (constant a), the 
uncertainties are again dominated by the uncertainties in the time delay. 
Unfortunately, the goodness of fit, x 2 (°)i shows too little dependence on a to 
determine Hq uniquely. In general, two-image lenses have too few constraints, 
and the extra constraints supplied by a four-image lens constrain the angular 
structure rather than the radial structure of the potential. This basic problem 
holds for all existing models of the current sample of time delay lenses. 

The inability of the present time delay lenses to directly constrain the 
radial density profile is the major problem for using them to determine Hq. 
Fortunately, it is a consequence of the available data on the current sam- 
ple rather than a fundamental limitation. It is, however, a simple trade-off 
- models with less dark matter (lower (re), more centrally concentrated den- 
sities) produce higher Hubble constants than those with more dark matter. 
We do have some theoretical limits on the value of (re). In particular, we can 
be confident that the surface density is bounded by two limiting models. The 
mass distribution should not be more compact than the luminosity distribu- 
tion, so a constant mass-to-light ratio (M/L) model should set a lower limit 
on (re) <; (h)m/l — 0.2, and an upper limit on estimates of Ho- We are also 
confident that the typical lens should not have a rising rotation curve at 1-2 
optical effective radii from the center of the lens galaxy. Thus, the SIS model 
is probably the least concentrated reasonable model, setting an upper bound 
on (re) < (re) sis — 1/2, and a lower limit on estimates of H . Figure IB. 3 71 
shows joint estimates of Ho from the four simple lenses for these two limiting 
mass distributions (Kochanek 2003b). The results for the individual lenses 
are mutually consistent and are unchanged by the new 0.149 ±0.004 day de- 
lay for the Ai-A 2 images in PG1115+080 fChartas l2003[) . For galaxies with 
isothermal profiles we find Ho = 48 ±3 km s _1 Mpc , and for galaxies with 
constant M/L we find Ho = 71 ± 3 km s _1 Mpc -1 . While our best prior 
estimate for the mass distribution is the isothermal profile (see 3B. 4.6(1 . the 
lens galaxies would have to have constant M/L to match Key Project esti- 
mate of Ho = 72 ± 8 km s -1 Mpc -1 (Freedman et al. l2"UUTjl or the WMAP 
estimate of Ho = 72 ±5 km s" 1 Mpc -1 for a flat universe with a cosmological 
constant (Spergel et al. 2003 ). 

5 This is simply an example. The same behavior would be seen for any other 
parametric model in which the radial density profile can be adjusted. 
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Fig. B.36. Ho estimates for PG1115+080. The lens galaxy is modeled as an ellip- 
soidal pseudo-Jaffe model, p oc r~ 2 (r 2 + a 2 )~ 1 , and the nearby group is modeled as 
an SIS. As the break radius a — > oo the pseudo-Jaffe model becomes an SIS model, 
and as the break radius a — > it becomes a point mass. The heavy solid curve 
{hexact) shows the dependence of Ho on the break radius for the exact, nonlinear 
fits of the model to the PG1115+080 data. The heavy dashed curve (h SC aiin a ) is the 
value found using our simple theory ( ilB.5.111 of time delays. The agreement of the 
exact and scaling solutions is typical. The light solid line shows the average surface 
density (k) in the annulus between the images, and the light dashed line shows the 
inverse of the logarithmic slope r\ in the annulus (k oc 6 1 ~ v ). For an SIS model 
we would have (k) = 1/2 and r/^ 1 — 1/2, as shown by the horizontal line. When 
the break radius is large compared to the Einstein radius (indicated by the vertical 
line), the surface density is slightly higher and the slope is slightly shallower than 
for the SIS model because of the added surface density from the group. As we make 
the lens galaxy more compact by reducing the break radius, the surface density 
decreases and the slope becomes steeper, leading to a rise in Ho- As the galaxy be- 
comes very compact, the surface density near the Einstein ring is dominated by the 
group rather than the galaxy, so the surface density approaches a constant and the 
logarithmic slope approaches the value corresponding to a constant density sheet 
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Fig. B.37. Ho likelihood distributions. The curves show the joint likelihood func- 
tions for Ho using the four simple lenses PG1115+080, SBS1520+530, B1600+434, 
and HE2 149-2745 and assuming either an SIS model (high («}, flat rotation curve) 
or a constant M/L model (low (k), declining rotation curve). The heavy dashed 
curves show the consequence of including the X-ray time delay for PG1115+080 
from Chartas (2003) in the models. The light dashed curve shows a Gaussian model 
for the Key Project result that H = 72 ± 8 km s _1 Mpc -1 . 

The difference between these two limits is entirely explained by the differ- 
ences in (k) and 77 between the SIS and constant M/L models. In fact, it is 
possible to reduce the Hq estimates for each simple lens to an approximation 
formula, H = A{1 -{k)) + B(k)(t] - 1). The coefficients, A and \B\ w A/10, 
are derived from the image positions and the time delay using the simple 
theory from flB.5.11 These approximations reproduce numerical results using 
ellipsoidal lens models to accuracies of 3 km s _1 Mpc -1 (Kochanek 2002a). 
For example, in Figure IB. 361 we also show the estimate of Hq computed 
based on the simple theory of ffl3.5.1l and the annular surface density ((k)) 
and slope (rj) of the numerical models. The agreement with the full numeri- 
cal solutions is excellent, even though the numerical models include both the 
ellipsoidal lens galaxy and a group. No matter what the mass distribution 
is, the five lenses PG1115+080, SBS1520+530, B1600+434, PKS1830-211, 6 
and HE2149-2745 have very similar dark matter halos. For a fixed slope 77, 
the five systems are consistent with a common value for the surface density 



6 PKS1830-211 is included based on the Winn et al. I2002bt model of the HST 
imaging data as a single lens galaxy. Courbin et al. (2002) prefer an interpretation 
with multiple lens galaxies which would invalidate the analysis. 



of 



(k) = 1 - 1.07ft, + 0.14(7? - 1)(1 -h)± 0.04 



(B.99) 
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where Hq = lQOh km s -1 Mpc -1 and there is an upper limit of a K <, 0.07 
on the intrinsic scatter of (n). Thus, time delay lenses provide a new window 
into the structure and homogeneity of dark matter halos, regardless of the 
actual value of H . 

There is an enormous range of parametric models that can illustrate how 
the extent of the halo affects (n) and hence Hq — the pseudo-Jaffe model 
we used above is only one example. It is useful, however, to use a physically 
motivated model where the lens galaxy is embedded in a standard NFW 
(Navarro, Frenk, & White 1996) profile halo as we discussed at the end of 
3B.4.1I The lens galaxy consists of the baryons that have cooled to form stars, 
so the mass of the visible galaxy can be parameterized using the cold baryon 
fraction fb.coid of the halo, and for these CDM halo models the value of (k) 
is controlled by the cold baryon fraction (Kochanek 2003a). A constant M/L 
model is the limit ft, cold — * 1 (with (n) ~ 0.2, rj ~ 3). Since the baryonic 
mass fraction of a CDM halo should not exceed the global fraction of /& ~ 
0.17 ± 0.03 (e.g., Spergel et al. 20031, we cannot use constant M/L models 
without also abandoning CDM. As we reduce fb,coid, we are adding mass to 
an extended halo around the lens, leading to an increase in (k) and a decrease 
in i]. For fb,coid — 0.02 the model closely resembles the SIS model ((k) ~ 1/2, 
7/ ~ 2). If we reduce fb.coid further, the mass distribution begins to approach 
that of the NFW halo without any cold baryons. Figure IB. 381 shows how 
(k) and H depend on f b , C oid for PG1115+080, SBS1520+530, B1600+434 
and HE2149-2745. When fb, co id - 0.02, the CDM models have parameters 
very similar to the SIS model, and we obtain a very similar estimate of 
Ho = 52 ± 6 km s _1 Mpc -1 (95% confidence). If all baryons cool, and 
fb,coid = fb, then we obtain Ho = 65 ± 6 km s _1 Mpc -1 (95% confidence), 
which is still lower than the Key Project estimates. 

We excluded the lenses requiring significantly more complicated models 
with multiple lens galaxies or very strong perturbations from clusters. If we 
have yet to reach a consensus on the mass distribution of relatively isolated 
lenses, it seems premature to extend the discussion to still more complicated 
systems. We can, however, show that the clusters lenses require significant 
contributions to (n) from the cluster in order to produce the same Hq as 
the more isolated systems. As we discussed in ^B.2I the three more complex 
systems are RXJ0911+0551, Q0957+561 and B1608+656. 

RXJ0911+0551 is very strongly perturbed by the nearby X-ray cluster 
(Morgan et al. IMfTl Hjorth et al. IMHSjl . Kochanek l|2003b)> found H = 
49 ± 5 km s" 1 Mpc -1 if the primary lens and its satellite were isothermal 
and Ho = 67±5 km s _1 Mpc -1 if they had constant mass-to-light ratios. The 
higher value of H = 71 ± 4 km s _1 Mpc -1 obtained by Hjorth et al. (2002 ) 
can be understood by combining 3B.5.1l and 3B.5.2l with the differences in the 
models. In particular, Hjorth et al. ( 2002| truncated the halo of the primary 
lens near the Einstein radius and used a lower mass cluster, both of which 
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Fig. B.38. Ho in CDM halo models. The top panel shows 1 — («) for the "simple" 
lenses (PG1115+080, SBS1520+530, B1600+434, and HE2149-2745) as a function 
of the cold baryon fraction fb, C old- The solid (dashed) curves include (exclude) the 
adiabatic compression of the dark matter by the baryons. The horizontal line shows 
the value for an SIS potential. The bottom panel shows the resulting estimates of 
Ho, where the shaded envelope bracketing the curves is the 95% confidence region 
for the combined lens sample. The horizontal band shows the Key Project estimate. 
For larger fb,coid, the density (k) decreases and the local slope rj steepens, leading 
to larger values of Hq. The vertical bands in the two panels show the lower bound 
on fb from local inventories and the upper bound from the CMB. 
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lower (k) and raise Hq. The Hjorth et al. ((2002 ) models also included many 
more cluster galaxies assuming fixed masses and halo sizes. 

Q0957+561 is a special case because the primary lens galaxy is the bright- 
est cluster galaxy and it lies nearly at the cluster center (Keeton et al. |2000 
Chartas et al. 12002)) . As a result, the lens modeling problems are particularly 
severe, and Keeton et al. l|20t)0) found that all previous models (most recently, 
Barkana et al. IT§9"9l Bernstein & Fischer ITM)! and Chae see i|B.4.6jl 

were incompatible with the observed geometry of the lensed host galaxy. 
While Keeton et al. ( 2000 ) found models consistent with the structure of the 
lensed host, they covered a range of almost ±25% in their estimates of Hq. 
A satisfactory treatment of this lens remains elusive. 

HE1 104-1805 has the most recently measured time delay (Ofek & Maoz[2003 
Wyrzykowski et al. l2003[> . Given the At = 161 ± 7 day delay, a standard SIE 
model of this system predicts a very high Hq ~ 90 km s _1 Mpc -1 . The 
geometry of this system and the fact that the inner image is brighter than 
the outer image both suggest that HE1104-1805 lies in an anomalously high 
tidal shear field, while the standard model includes a prior to keep the exter- 
nal shear small. A prior is needed because a two-image lens supplies too few 
constraints to determine both the ellipticity of the main lens and the exter- 
nal shear simultaneously. Since the images and the lens in HE1 104-1805 are 
nearly collinear, the anomalous Hq estimate for the standard model may be 
an example of the shear degeneracy we briefly mentioned in flB.5.11 At present 
the model surveys needed to understand the new delay have not been made. 
Observations of the geometry of the host galaxy Einstein ring will resolve 
any ambiguities due to the shear in the near future (see 3B.10)1 . 

The lens B1608+656 consists of two interacting galaxies, and, as we 
discussed in ffl3.2l this leads to a greatly increased parameter space. Fass- 
nacht et al. (2002J used SIE models for the two galaxies to find Hq = 
61 — 65 km s _1 Mpc -1 , depending on whether the lens galaxy positions 
are taken from the iJ-band or /-band lens HST images (the statistical errors 
are negligible). The position differences are probably created by extinction 
effects from the dust in the lens galaxies. Like isothermal models of the "sim- 
ple" lenses, the Hq estimate is below local values, but the disagreement is 
smaller. These models correctly match the observed time delay ratios. Koop- 
mans et al. 1)200 33 obtain a still higher estimate of Hq = 75±7 km s _1 Mpc -1 
largely because the lens galaxy positions shift after they include extinction 
corrections. They use a foreground screen model to make the extinction cor- 
rections, which is a better approximation than no extinction corrections, but 
is unlikely allow precise correction in a system like B1608+656 where the 
dust and stars are mixed and there is no simple relation between color excess 
and optical depth (e.g. Witt, Thronson & Capuano[1992). 

Despite recent progress both in modeling the VLBI structure (Wucknitz et 
al. 2004) and obtaining deep images (York et al. 12004)1 it is unclear whether 
B0218+357 has escaped its problems with astrometry and models. While 
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York et al. l|2004[l have clearly measured the position of the lens galaxy, 
the dependence of the position on the choice of the PSF model remains a 
significant source of uncertainty for estimates of Ho. Models of the system 
using power law models find a slope very close to isothermal r] — 2.04 ± 0.02 
(p oc r~ n ). Unfortunately, these models have too few degrees of freedom 
given the small astrometric uncertainties in the VLBI structures providing the 
constraints (because the only angular structure in the model is the ellipsoidal 
potential used for the main lens galaxy), and this makes the limits on the 
power slope suspect (see 3B.4.6|) . For example, while it is true that Lehar 
et al. 11200011 estimated that the environmental shear near B0218+357 was 
small, even a 7 = 0.01 external tidal shear produces deflections (3 milli- 
arcseconds) that are large compared to the accuracy of the constraints used 
for the models and so must be included for the models to be reliable. With 
these caveats, B0218+357 (like the models of B1608+656 with significant 
extinction corrections) support a nearly isothermal mass distribution with 
H = 73 ±8 km s" 1 Mpc" 1 . 

B.5.5 The Future of Time Delay Measurements 

We understand the theory of time delays very well - the only important 
variable in the lens structure is the average surface density (k) of the lens 
near the images for which the delay is measured. The angular structure of the 
potential has an effect on the delays, but it is either small or well-constrained 
by the observed image positions. Provided a lens does not lie in a cluster where 
the cluster potential cannot be described by a simple expansion, any lens 
model that includes the parameters needed to vary the average surface density 
of the lens near the images and to change the ratio between the quadrupolc 
moment of the lens and the environment includes all the variables needed to 
model time delays, to estimate the Hubble constant, and to understand the 
systematic uncertainties in the results. Unfortunately, there is a tendency in 
the literature to confuse rather than to illuminate this understanding, even 
though all differences between estimates of the Hubble constant for the simple 
time delay lenses can be understood on this basis. 

The problem with time delays lies with the confusing state of the data. 
The four simplest time delay lenses, PG1115+080, SBS1520+530, B1600+434 
and HE2149-2745, can only match the currently preferred estimate of Hq — 
72 ± 8 km s" 1 Mpc^ 1 (Freedman et al. 12771711 Spergel et al. if they 

have nearly constant M/L mass distributions. If they have the favored quasi- 
isothermal mass distributions, then Ho — 48±3 km s _1 Mpc -1 . This leads to 
a conundrum: why do simple lenses with time delay measurements have falling 
rotation curves, while simple lenses with direct estimates of the mass profile 
do not? This is further confused by B1608+656 and B0218+357, which due 
to their observational complexity would be the last systems I would attempt 
to understand, but in current analyses can be both isothermal and have high 
Hq- In resolving this problem it is not enough to search for a "Golden Lens." 
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There is no such thing! While chanting "My lens is better than your lens!" 
may be satisfying, it contributes little to understanding the basic problem. 

The difficulty at the moment is that systems I would view as problematic 
(B0218+357 due to problems in astrometry or B1608+656 due to the inter- 
acting lens galaxies) allow both mass distributions with flat rotation curves 
and Hq = 72 km s _1 Mpc -1 , while systems that should be simpler to in- 
terpret (the simple lenses in Table IB" 5. 2(1 do not. Yet the preponderance of 
evidence on the mass distributions of lens galaxies suggests that they are 
fairly homogeneous in structure and have roughly flat rotation curves (jJH3J- 
The simplest way to clarify this problem is to measure accurate time delays 
for many more systems. At a fixed value of the Hubble constant we will ei- 
ther find significant scatter in the surface densities near the images of simple 
lenses or we will not. 

B.6 Gravitational Lens Statistics 

It is the opinion of the author that the statistics of lenses as a method for 
determining the cosmological model has largely ceased to be interesting. How- 
ever, it is important to understand the underlying physics because it deter- 
mines the types of lenses we detect. While most recent analyses have found 
cosmological results consistent with the concordance model (Chae et al.[2002 
ChaeEESl Davis, Huterer & KraussEOOSI Mitchell et al-EDDD there are still 
large statistical uncertainties and some dangerous systematic assumptions. 
More importantly, there is little prospect at present of lens statistics becom- 
ing competitive with other methods. Gravitational lenses statistics arguably 
begins with Press & Gunn (1973), although the "modern" era begins with the 
introduction of magnification bias fTurner 11980(1 . the basic statistics of nor- 
mal galaxy lenses (Turner, Ostriker & Gott ll98 H), cross sections and optical 
depths for more general lenses (Blandford & Kochanek 1987a, Kochanek & 
Blandford ll987fl . explorations of the effects of general cosmologies (Fukugita 
et al.Ennni Fukugita & Turner IBM)! and lens structure (Maoz & Rix 1993, 
Kochanek I199fia|) and the development of the general methodology of inter- 
preting observations ( Kochanek 1 1 993bl 1 1 99fia() . 

B.6.1 The Mechanics of Surveys 

There are two basic approaches to searching for gravitational lenses depend- 
ing on whether you start with a list of potentially lensed sources or a list of 
potential lens galaxies. Of the two, only a search of sources for lensed sources 
has a significant cosmological sensitivity - for a non-evolving population of 
lenses in a flat cosmological model we will find in ^B.6.31 that the number of 
lensed sources scales with the volume between the observer and the source 
-Df . If you search potential lens galaxies for those which have actually lensed a 
source, then the cosmological dependence enters only through distance ratios, 
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Dds/D s , and you require a precise knowledge of the source redshift distri- 
bution. Thus, while lenses found in this manner are very useful for many 
projects (mass distributions, galaxy evolution etc.), they are not very useful 
for determining the cosmological model. This changes for the case of clus- 
ter lenses where you may find multiple lensed sources at different redshifts 
behind the same lens (e.g. Soucail, Kneib & Gorse|2004). 

Most lenses have been found by searching for lensed sources because the 
number of targets which must be surveyed is considerably smaller. This is 
basically a statement about the relative surface densities of candidate sources 
and lenses. The typical lens is a galaxy with an Einstein radius of approxi- 
mately b ~ l'/O so it has a cross section of order wb 2 . If you search N lenses 
with such a cross section for signs of a lensed source, you would expect to find 
Nirb 2 U source lenses where ^source is the surface density of detectable sources. 
If you search N sources for a lens galaxy in front of them, you would expect 
to find Nirb 2 £i ens lenses, where Si ens is the surface density of lens galaxies. 
Since the surface density of massive galaxies is significantly higher than the 
surface density of easily detectable higher redshift sources (2J[ ens ^> U source ), 
you need examine fewer sources than lens galaxies to find the same number of 
lensed systems. This is somewhat mitigated by the fact the surface density of 
potential lens galaxies is high enough to allow you to examine many potential 
lenses in a single observation, while the surface density of sources is usually 
so low that they can be examined only one at a time. 

For these reasons, we present a short synopsis of searches for sources be- 
hind lenses and devote most of this section to the search for lenses in front 
of sources. The first method for finding sources behind lenses is a simple 
byproduct of redshift surveys. Redshift surveys take spectra of the central 
regions of low redshift galaxies allowing the detection of spectral features 
from any lensed images inside the aperture used for the spectrum. Thus, the 
lens Q2237+0305 was found in the CfA redshift survey (Huchra et al. 1985 ) 
and SDSS0903+5028 (Johnston et al. IMiS)) was found in the SDSS survey. 
Theoretical estimates fKochanek ll992bl Mortlock & Webster l"2000c)) suggest 
that the discovery rate should one per 10 4 -10 5 redshift measurements, but 
this does not seem to be borne out by the number of systems discovered 
in this age of massive redshift surveys (the origin of the lower rate in the 
2dF survey is discussed by Mortlock & Webster I2()()l|) . Miralda-Escude & 
Lehar (1992) proposed searching for lensed optical (emission line) rings, a 
strategy successfully used by Warren et al. (|199rif> to find 0047-2808 and by 
Ratnatunga, Griffiths & Ostrander Q1999f) to find lenses in the HST Medium 
Deep Survey (MDS). There is also a hybrid approach whose main objective 
is simply to find lenses with minimal follow up observations by looking for 
high redshift radio lobes that have non-stellar optical counterparts (Lehar et 
al. 2001). Since radio lobes have no intrinsic optical emission, a lobe super- 
posed on a galaxy is an excellent lens candidate. The present limitation on 
this method is the low angular resolution of the available all sky radio sur- 
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Fig. B.39. Example of a local galaxy luminosity function. These are the K-band 
luminosity functions for either all galaxies or by morphological type from Kochanek 
et al. (120011) . The curves show the best fit Schechter models for the luminosity 
functions while the points with error bars show a non-parametric reconstruction. 

veys (FIRST, NVSS) and the magnitude limits and star/galaxy separation 
problems of the current all-sky optical catalogs. Nonetheless, several systems 
have been discovered by this technique. 

The majority of lens surveys, however, have focused on either optical 
quasars or radio sources because they are source populations known to lie 
at relatively high redshift (z s <; 1) and that are easily detected even when 
there is an intervening lens galaxy. Surveys of optical quasars (Crampton, 
McClure & Fletcher 115521 Yee, Fillipenko & Tang 115531 Maoz et al. 1993, 
Surdej et al. 119931 Kochanek, Falco & Schild [T555|) have the advantage that 
the sources are bright, and the disadvantages that the bright sources can 
mask the lens galaxy and that the selection process is modified by dust in 
the lens galaxy and emission from the lens galaxy. We will discuss these effects 
in TO. 91 While many more lensed quasars have been discovered since these 
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Fig. B.40. Schechter parameters a and M* for the 2MASS luminosity functions 
shown in Fig. IB. 39! Note there is a significant correlations not only between a and 
M* but also with the comoving density scale n* that should be included in lens 
statistical calculations but generally are not. 



efforts, none of the recent results have been presented as a survey. Surveys of 
all radio sources (the MIT/Greenbank survey, Burke, Lehar & Conner 1992) 
have the advantage that most lensed radio sources are produced by extended, 
steep spectrum sources (see Kochanek & Lawrence I1990|) . and the disadvan- 
tage that the complex intrinsic structures of extended radio sources make 
the follow up observations difficult. Surveys of flat spectrum radio sources 
(the CLASS survey, Browne et al. I2HU31 the PANELS survey, Winn, Hewitt 
& Schechter I2l)l)lf) have the advantage that the follow up observations are 
relatively simple because most unlensed flat spectrum sources are (nearly) 
point sources. There are disadvantages as well - because the source structure 
is so simple, flat spectrum lenses tend to provide fewer constraints on mass 
models than steep spectrum lenses. The radio sources tend to be optically 
faint, making it difficult to determine their redshifts in many cases. 

The second issue for any survey is to understand the method by which 
the sources were originally identified. For example, it is important to know 
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whether the flux of a lensed source in the input catalog is the total flux 
of all the images or only a part of the flux (e.g. the flux of the brightest 
image). This will have a significant effect on the statistical corrections for 
using a flux-limited catalog, a correction known in gravitational lensing as 
the "magnification bias" (see 3B.6.6JI . All large, published surveys were es- 
sentially drawn from samples which would include the total flux of a lensed 
system. It is also important to know whether the survey imposed any crite- 
rion for the sources being point-like, since lensed sources are not, or any color 
criterion that might be violated by lensed sources with bright lens galaxies 
or significant extinction. 

The third issue for any survey is to consider the desired selection function 
of the observations. This is some combination of resolution, dynamic range 
and field of view. These determine the range of lens separations that are de- 
tectable, the nature of any background sources, and the cost of any follow up 
observations. Any survey is a trade-off between completeness (what fraction 
of all lenses in sample that can be discovered), false positives (how many 
objects selected as lenses candidates that are not), and the cost of follow-up 
observations. The exact strategy is not critical provided it is well-understood. 
The primary advantages of the surveys of flat spectrum radio sources are the 
relatively low false positive rates and follow up costs produced by using a 
source population consisting almost entirely of point sources with no contam- 
inating background population. This does not mean that the flat spectrum 
surveys are free of false positives - core-jet sources can initially look like 
asymmetric two-image lenses. On small angular scales (A9 <, 3'.'0) the quasar 
surveys share this advantage, but for wider separations there is contamination 
from binary quasars (see 3B.7.2(I and Galactic stars (see Kochanek[l993a). 

B.6.2 The Lens Population 

The probability that a source has an intervening lens requires a model for 
the distribution of the lens galaxies. In almost all cases these are based on 
the luminosity function of local galaxies combined with the assumption that 
the comoving density of galaxies does not evolve with redshift. Of course 
luminosity is not mass, so a model for converting the luminosity of a local 
galaxy into its deflection scale as a lens is a critical part of the process. For 
our purposes, the distributions of galaxies in luminosity are well-described 
by a Schechter H1976I) function, 

Tl'tXtJ^ 1 ^ (B ' 100 » 

The Schechter function has three parameters: a characteristic luminosity L* 
(or absolute magnitude A/*), an exponent a describing the rise at low lumi- 
nosity, and a comoving density scale rt*. All these parameters depend on the 
type of galaxy being described and the wavelength of the observations. In 
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Fig. B.41. K-band kinematic relations for 2MASS galaxies. The top panels show 
the Faber- Jackson relation and the bottom panels show the Tully-Fisher relations 
for 2MASS galaxies with velocity dispersions and circular velocities drawn from the 
literature. The left hand panels show the individual galaxies, while the right hand 
galaxies show the mean relations. Note the far larger scatter of the Faber-Jackson 
relation compared to the Tully-Fisher relation. 



general, lens calculations have divided the galaxy population into two broad 
classes: late-type (spiral) galaxies and early-type (E/SO) galaxies. Over the 
period lens statistics developed, most luminosity functions were measured 
in the blue, where early and late-type galaxies showed similar character- 
istic luminosities. The definition of a galaxy type is a slippery problem - 
it may be defined by the morphology of the surface brightness (the tradi- 
tional method), spectral classifications (the modern method since it is easy 
to do in redshift surveys), colors (closely related to spectra but not identi- 
cal), and stellar kinematics (ordered rotational motions versus random mo- 
tions). Each approach has advantages and disadvantages, but it is important 
to realize that the kinematic definition is the one most closely related to 
gravitational lensing and the one never supplied by local surveys. Fig. LB. 391 
shows an example of a luminosity function, in this case K-band infrared lu- 
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Fig. B.42. The resulting velocity functions from combining the K-band luminosity 
functions (Fig. IB.39t and kinematic relations (Fig. IB.4lt for early- type (top), late- 
type (middle) and all (bottom) galaxies. The points show partially non-parametric 
estimates of the velocity function based on the binned estimates in the right hand 
panels of Fig. IB.4ll rather than power-law fits. Note that early- type galaxies domi- 
nate for high circular velocity. 

minosity function by Kochanek et al. l|2001l also Cole et al. I2001f) where 
Mjf« = -23.53 ± 0.06 mag, n* e = (0.45 ± 0.06) x 10" 2 /i 3 Mpc" 3 , and 
a e = — 0.87± 0.09 for galaxies which were morphologically early-type galax- 
ies and M K *i = -22.98±0.06 mag, n* ; = (1.01 ±0.13) x 10~ 2 /i 3 Mpc" 3 , and 
ai = —0.92 ± 0.10 for galaxies which were morphologically late-type galax- 
ies. Early-type galaxies are less common but brighter than late-type galaxies 
at K-band. It is important to realize that the parameter estimates of the 
Schechter function are correlated, as shown in Fig. IB. 401 and that it is dan- 
gerous to simply extrapolate them to fainter luminosities than were actually 
included in the survey 

However, light is not mass, and it is mass which determines lensing prop- 
erties. One approach would simply be to assign a mass-to-light ratio to the 
galaxies and to the expected properties of the lenses. This was attempted 
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only in Maoz & Rix (1993J who found that for normal stellar mass to light 
ratios it was impossible to reproduce the data (although it is possible if you 
adjust the mass-to-light ratio to fit the data, also see Kochanek 1996a). In- 
stead, most studies convert the luminosity functions dn/dL into a velocity 
functions dn/dv using the local kinematic properties of galaxies and then 
relate the stellar kinematics to the properties of the lens model. As Fig. IB.4ll 
shows (for the same K-band magnitudes of our luminosity function example) , 
both early-type and late-type galaxies show correlations between luminosity 
and velocity. For late-type galaxies there is a tight correlation known as the 
Tully-Fisher Ijl977[l relation between luminosity L and circular velocity v c 
and for early-type galaxies there is a loose correlation known as the Faber- 
Jackson (1976) relation between luminosity and central velocity dispersion 
a v . Early-type galaxies do show a much tighter correlation known as the fun- 
damental plane (Dressier et al. 119871 Djorgovski & Davis I1987[) but it is a 
three- variable correlation between the velocity dispersion, effective radius and 
surface brightness (or luminosity) that we will discuss in 3B.9I While there 
is probably some effect of the FP correlation on lens statistics, it has yet 
to be found. For lens calculations, the circular velocity of late-type galaxies 
is usually converted into an equivalent (isotropic) velocity dispersion using 
v c = y/2a v . We can derive the kinematic relations for the same K-band- 
selected galaxies used in the Kochanek et al. |200l| luminosity function, 
finding the Faber-Jackson relation 

M k - 5 log ft = (-23.83 ± 0.03) - 2.5(4.04 ± 0.18) (log v c - 2.5) (B.101) 

and the Tully-Fisher relation 

M k - 51ogft = (-22.92 ±0.02) - 2.5(3.96 ± 0.08)(logw c - 2.3). (B.102) 

These correlations, when combined with the K-band luminosity function have 
the advantage that the magnitude systems for the luminosity function and 
the kinematic relations are identical, since magnitude conversions have caused 
problems for a number of lens statistical studies using older photographic 
luminosity functions and kinematic relations. For these relations, the char- 
acteristic velocity dispersion of an early-type galaxy is cr* e ~ 209 km/s 
while that of an L* late-type galaxy is cr*; ~ 143 km/s. These are fairly typ- 
ical values even if derived from a completely independent set of photometric 
data. 

Both the Faber-Jackson and Tully-Fisher relations are power-law relations 
between luminosity and velocity, L/L* oc (a v / o*)' 1FJ . This allows a simple 
variable transformation to convert the luminosity function into a velocity 
function, 
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Fig. B.43. Stellar velocity dispersions vi os for a Hernquist distribution of stars in 
an isothermal halo of dispersion a dm- The solid curves show the local value vi os 
and the dashed curves show the mean interior to the radius (vf os ). Local velocity 
dispersions are typically measured on scales similar to R e /8 where the stellar and 
dark matter dispersions are nearly equal rather than matching the viral theorem 
limit which would be reached in an infinite aperture. The upper, lower and middle 
curves are for stars with isotropies of /3 — 0.2 (somewhat radial), /3 = (isotropic) 
and /3 = —0.2 (somewhat tangential). 



There are three caveats to keep in mind about this variable change. First, we 
have converted to the distribution in stellar velocities, not some underlying 
velocity characterizing the dark matter distribution. Many early studies as- 
sumed a fixed transformation between the characteristic velocity of the stars 
and the lens model. In particular, Turner, Ostriker & Gott (1984) introduced 
the assumption a dark — (3/2) 1 / 2 cr stQrs for an isothermal mass model based on 
the stellar dynamics (Jeans equation, Ean. IB.90l and ^B.4.9|l of a r~ 3 stellar 
density distribution in a r~ 2 isothermal mass distribution. Kochanek fl993b 
1994) showed that this oversimplified the dynamics and that if you embed 
a real stellar luminosity distribution in an isothermal mass distribution you 
actually find that the central stellar velocity dispersion is close to the veloc- 
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ity dispersion characterizing the dark matter halo. Fig. IB. 431 compares the 
stellar velocity dispersion to the dark matter halo dispersion for a Hernquist 
distribution of stars in an isothermal mass distribution. Such a normalization 
calculation is required for any attempt to match the observed velocity func- 
tions with a particular mass model for the lenses. Second, in an ideal world, 
the luminosity function and the kinematic relations should be derived from 
a consistent set of photometric data, while in practice they rarely are. As we 
will see shortly, the cross section for lensing scales roughly as <t*, so small 
errors in estimates of the characteristic velocity have enormous impacts on 
the resulting cosmological results - a 5% velocity calibration error leads to a 
20% error in the lens cross section. Since luminosity functions and kinematic 
relations are rarely derived consistently (the exception is Shcth et al.[2003), 
the resulting systematic errors creep into cosmological estimates. Finally, for 
the early-type galaxies where the Faber- Jackson kinematic relation has sig- 
nificant scatter, transforming the luminosity function using the mean relation 
as we did in Ea n . IB . 1 31 while ignoring the scatter underestimates the number 
of high velocity dispersion galaxies (Kochanek 1994, Sheth et al. 2003). This 
leads to underestimates of both the image separations and the cross sections. 
The fundamental lesson of all these issues is that the mass scale of the lenses 
should be "self-calibrated" from the observed separation distribution of the 
lenses rather than imposed using local observations (as we discuss below in 

Most lens calculations have assumed that the comoving density of the 
lenses does not evolve with redshift. For moderate redshift sources this only 
requires little evolution for zi < 1 (mostly zi < 0.5), but for higher redshift 
sources it is important to think about evolution as well. The exact degree of 
evolution is the subject of some debate, but a standard theoretical prediction 
for the change between now and redshift unity is shown in Fig. IB .441 (see 
Mitchell et al. 120041 and references therein). Because lower mass systems 
merge to form higher mass systems as the universe evolves, low mass systems 
are expected to be more abundant at higher redshifts while higher mass 
systems become less abundant. For the a v ~ er» ~ 200 km/s galaxies which 
dominate lens statistics, the evolution in the number of galaxies is actually 
quite modest out to redshift unity, so we would expect galaxy evolution to 
have little effect on lens statistics. Higher mass systems evolve rapidly and are 
far less abundant at redshift unity, but these systems will tend to be group 
and cluster halos rather than galaxies and the failure of the baryons to cool 
in these systems is of greater importance to their lensing effects than their 
number evolution (see ^B.7|) . There have been a number of studies examining 
lens statistics with number evolution (e.g. Mao ll99Tl Mao & Kochanek 1994 
Rix et al. I1994[) and several attempts to use the lens data to constrain the 
evolution (Ofek, Rix & Maoz 120031 Chae & Mao 1217)31 Davis, Huterer & 
KraussEHEll- 
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Fig. B.44. The ratio of the velocity function of halos at z 
from Mitchell et al 



1 to that at 2 = 

4J. The solid curve shows the expectation for an Qa — 0.78 
flat cosmological model. The points show results from an N-body simulation with 
Qa — 0.7 and the dashed curve shows the theoretical expectation. For comparison, 
the dotted curve shows the evolution model used by Chae & Mao {2003 ). 



B.6.3 Cross Sections 

The basic quantity we need for any statistical analysis is the cross section 
of the lens for producing the desired lensing effect (e.g. multiple images, two 
images, bright images...). The simplest cross section is the multiple imaging 
cross section of the SIS lens - the angular area on the source plane in which 
a source will produce two lensed images. We know from Eqns. IB. 211 and 
IB.22I that the source must lie within Einstein radius b of the lens center to 
produce multiple images, so the cross section is simply us is — ^h 2 . Since the 
Einstein radius b — 4ir(a v /c) 2 Di s /D s depends on the velocity dispersion and 
redshift of the lens galaxy, we will need a model for the distribution of lenses 
in redshift and velocity dispersion to estimate the optical depth for lensing. 
If we are normalizing directly to stellar dynamical measurements of lenses, 
then we will also need a dynamical model (e.g. the Jeans equations of flB.4.9(l 
to relate the observed stellar velocity dispersions to the characteristic dark 
matter velocity dispersion a v appearing as a parameter of the SIS model. We 
can also compute cross sections for obtaining different image morphologies. 
For example, in E cm. IB. .321 we calculated the caustic boundaries for the four- 
image region of an SIS in an external shear 7. If we integrate to find the area 
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inside the caustic we obtain the four-image cross section 



while (provided \j\ < 1/3) the two-image cross section is 02 = csis — °4 — 
&SIS- If the shear is larger, then the tips of astroid caustic extend beyond 
the radial (pseudo-)caustic and the lens has regions producing two images, 
three images in the disk geometry fFig. IB.18l) . and four images with no simple 
expression for the cross sections. There are no analytic results for the singular 
isothermal ellipsoid fEan. lB~37l with s = 0), but we can power expand the 
cross section as a series in the ellipticity to find at lowest order that 

cr 4 = ^b 2 e 2 (B.105) 

for a lens with axis ratio q = 1 — e, while the total cross section is crsis — ^b 2 
(e.g. Kochanek lTMBbl Finch et al.EESl- As a general rule, a lens of ellipticity 
e is roughly equivalent to a spherical lens in an external shear of 7 ~ e/3. 
According to the cross sections, the fraction of four-image lenses should be 
of order a^/asis ~ 7 2 ~ ( e /3) 2 ~ 0.01 rather than the observed 30%. Most 
of this difference is a consequence of the different magnification biases of the 
two image multiplicities. 

There is an important subtlety when studying lens statistics with models 
covering a range of axis ratios, namely that the definition of the critical 
radius b in (say) the SIE model (Eqn. IB.37|) depends on the axis ratio and 
exactly what quantity you are holding fixed in your calculation (see Keeton, 
Kochanek k Seliak lT9"9Tl Keeton & Kochanek IT5M1 Rusin & Tegmark lMTTl 
ChaeI2nn2J. For example, if we compare a singular isothermal sphere to a 
face on Mestel disk with the same equatorial circular velocity, the Einstein 
radius of the disk is 2/ir smaller than the isothermal sphere because for 
the same circular velocity a disk requires less mass than a sphere. Since we 
usually count galaxies locally and translate these counts into a dynamical 
variable, this means that lens models covering a range of ellipticities must be 
normalized in terms of the same dynamical variables as were used to count 
the galaxies. 

Much early effort focused of the effects of adding a finite core radius 
to these standard models (e.g. Blandford & Kochanek I1987bl Kochanek & 
BlandfordHnHZI Kovner llWal Hinshaw & Krauss HWl Krauss & White lTMl 
Wallington & Narayan 1993, Kochanek 1996a}. The core radius s leads to an 
evolution of the caustic structures (see Part 1, Blandford & Narayan 1986) 
with the ratio between the core radius and the critical radius s/b. Strong 
lenses with s/b <C 1 act like singular models. Weak, or marginal, lenses 
with s/b ^ 1 have significantly reduced cross sections but higher average 
magnifications such that the rising magnification bias roughly balances the 
diminishing cross section to create a weaker than expected effect of core radii 
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on the probability of finding a lens (see Kochanek 1996a). As the evidence 
that lenses are effectively singular has mounted, interest in these models has 
waned, and we will not discuss them further here. There is some interest in 
these models as a probe of large separation lenses due to groups and clusters 
where a finite core radius is replaced by effects of the shallow par" 1 NFW 
density cusp, and we will consider this problem in ffl3. 71 where we discuss large 
separation lenses. 



B.6.4 Optical Depth 

The optical depth associated with a cross section is the fraction of the sky in 
which you can place a source and see the effect. This simply requires adding 
up the contributions from all the lens galaxies between the observer and the 
redshift of the source. For the SIS lens we simply need to know the comoving 
density of lenses per unit dark matter velocity dispersion dn/da (which may 
be a function of redshift) 

tsis = / -j— dzi / — da v (B.106) 

Jo dz t J Q da v An 

where dV/dzi is the comoving volume element per unit redshift (e.g. Turner, 
Ostriker & Gott ll98l]) . For a flat cosmology, which we adopt from here on, the 
comoving volume element is simply dV = AirD^dDd where Dd is the comoving 
distance to the lens redshift fEan. lB~2| . As with most lens calculations, this 
means that the expression simplifies if expressed in terms of the comoving 
angular diameter distances, 

™-r^(^)T^(T) 4 

(Gott, Park & Lee ll989l Fukugita, Futamase & Kasai 1990). If the comoving 
density of the lenses does not depend on redshift, the integrals separate to 
give 

8tt 2 „o f°° , dn fa v \^ 
TSIS = l5 D "J *^(f) (R108) 

(Fukugita & Turner 1991). If we now assume that the galaxies can be de- 
scribed by the combination of Schechter luminosity functions and kinematic 
relations described in ilB.6.21 then we can do the remaining integral to find 
that 

rsis = ^-n* (^) 4 Z> s 3 r[1 + a+6 / 7 ] - i-r,r H 3 ^r[l + a + 6/ 7 ] (B.109) 

where r[x] is a Gamma function, r# = c/ Hq is the Hubble radius and the 
optical depth scale is 



l 67 r 3 n*ri f-) 4 = 0.026 ( ( V . (B.110) 

" \ c J Vl0- 2 /i 3 Mpc- 3 y V 200km s' 1 J 
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Thus, lens statistics are essentially a volume test of the cosmology (the -Df), 
predicated on knowing the comoving density of the lenses (n*) and their 
average mass (a - *). The result does not depend on the Hubble constant - 
all determinations of n* scale with the Hubble constant such that n^D^ is 
independent of H n . 

Two other distributions, those in image separation and in lens redshift at 
fixed image separation, are easily calculated for the SIS model and useful if 
numerical for any other lens. The SIS image separation is A9 = 8Tr(a v /c) 2 D < i s /D s 
so 

^ - \DlA9 2 (r [1 + a - 2/ lFJ , £] (B.lll) 



dAB 



-2A9r [1 + a - 4/ 7FJ , f ] + AO* T [1 + a - 6/ 7 fj, f] 



where £ = (ZW/Z\(9») 7Fj/2 and 



<7* 



2 



Z\^=8tt — =2'.'3 r (B.112) 

V c / V 200km s -1 / 

is the maximum separation produced by an L* galaxy. The mean image 
separation, 

ZW* r[l + a + 8/ 7 ] 

( ^-— r[i + « + 6/ 7 ]V 2 ' 

depends only on the properties of the lens galaxy and not on cosmology. If 
the cosmological model is not flat, a very weak dependence on cosmology is 
introduced (Tvochanek 19i lScjl . For a known separation A9, the probability 
distribution for the lens redshift becomes 



dP D 2 d dD d 
dzi D s dzi 



f AO D s 



1/2' 



(B.114) 



(we present the result only for Schechter function a = — 1 and Faber- Jackson 
7fj = 4). The location of the exponential cut off introduced by the luminosity 
function has a strong cosmological dependence, so the presence or absence 
of lens galaxies at higher redshifts dominates the cosmological limits. The 
structure of this function is quite different from the total optical depth, which 
in a flat cosmology is a slowly varying function with a mean lens distance 
equal to one-half the distance to the source. The mean redshift changes with 
cosmology because of the changes in the distance-redshift relations, but the 
effect is not as dramatic as the redshift distributions for lenses of known 
separation. 

We end this section by discussing the Keeton 120112$ "heresy" . Keeton ( 200! 
pointed out that if you used a luminosity function derived at intermediate 
redshift rather than locally, then the cosmological sensitivity of the optical 
depth effectively vanishes when the median redshift of the lenses matches 
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the median redshift of the galaxies used to derive the luminosity function. 
The following simple thought experiment shows that this is true at one level. 
Suppose there was only one kind of galaxy and we make a redshift survey and 
count all the galaxies in a thin shell at redshift z, finding N galaxies between z 
and z+Az. The implied comoving density of the galaxies, n — N/ (AzdV/dz), 
depends on the cosmological model with the same volume factor appearing 
in the optical depth calculation (Eqn. IB.106|) . To the extent that the redshift 
ranges and weightings of the galaxy survey and a lens survey are similar, 
the cosmological sensitivity of the optical depth vanishes because the volume 
factor cancels and the optical depth depends only on the number of observed 
galaxies N. This does not occur when we use a local luminosity function 
because changes in cosmology have no effect on the local volume element. 
The problem with the Keeton ( 2002 ) argument is that it basically says that 
if we could use galaxy number counts to determine the cosmological model 
then we would not need lensing to do so because the two are redundant. To 
continue our thought experiment, we also have local estimates ni oca i for the 
density of galaxies, and as we vary the cosmology we would find that n and 
niocai agree only for a limited range of cosmological models and this would 
restore the cosmological sensitivity. The problem is that the comparison of 
near and distant measurements of the numbers of galaxies is tricky because 
it depends on correctly matching the galaxies in the presence of galaxy evo- 
lution and selection effects - in essence, you cannot use this argument to 
eliminate the cosmological sensitivity of lens surveys unless you think you 
understand galaxy evolution so well that you can use galaxy number counts 
to determine the cosmological model, a program of research that has basically 
been abandoned. 

B.6.5 Spiral Galaxy Lenses 

Discussions of lens statistics, or even lenses in general, focus on early-type 
galaxies (E/SO). The reason is that spiral lenses are relatively rare. The 
only morphologically obvious spirals are B0218+357 (Sc, York et al. |2004), 
B160 0+434 (SO/Sa, Jaunsen & Hiorth lTWI) . PKS1830-211 (S b/Sc, Winn 
et al. 12002 bll . PMNJ2004-1349 (Sb/S c, Winn, Hall & Schechter HH , and 
Q2237-0305 (Sa, Huchra et al. 1985) . Other small separation systems may 
well be spiral galaxies, but we do not have direct evidence from imaging. 
There are studies of individual spiral lenses or the statistics of spiral lenses 
by Mailer, Flores & Primack l|1997[) . Keeton & Kochanek l|1998|l . Koopmans 
et al. l(T§M|> , Maile r et al. @UB0fr . Trott & Webster (HHJ), and Winn, Hall 
& Schechter fflffiSt . 

The reason lens samples are dominated by early-type galaxies is that 
the early-type galaxies are more massive even if slightly less numerous (e.g. 
Fukugita & Turner H991l see flB.6.2|) . The relative numbers of early-type and 
late-type lenses should be the ratio of their optical depths, (ni/n e )(ai/a e ) 4: , 
based on the comoving densities and characteristic velocity dispersions of 
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the early and late-type galaxies. For example, in the Kochanek et al. (2001) 
K-band luminosity function ni/n e ~ 2.2 while the ratio of the characteristic 
velocity dispersions is <7*j/cr* e = 0.68 giving an expected fraction of 32% spi- 
ral. This is modestly higher than the values using other luminosity functions 
(usually closer to 20%) or the observed fraction. Because the typical separa- 
tion of the spiral lenses will also be smaller by a factor of (a*i/a* e ) 2 = 0.46, 
they will be much harder to resolve given the finite resolution of lens sur- 
veys. Thus, survey selections functions discriminate more strongly against 
late-type lenses than against early-type lenses. The higher prevalence of dust 
in late-type lenses adds a further bias against them in optical surveys. 

B.6.6 Magnification Bias 

The optical depth calculation suggests that the likelihood of finding that 
a z s ~ 2 quasar is lensed is very small (r ~ 10~ 4 ) , while observational 
surveys of bright quasars typically find that of order 1% of bright quasars are 
lensed. The origin of the discrepancy is the effect known as "magnification 
bias" (Turner [l980f) . which is really the correction needed to account for the 
selection of survey targets from flux limited samples. Multiple imaging always 
magnifies the source, so lensed sources are brighter than the population from 
which they are drawn. For example, the mean magnification of all multiply 
imaged systems is simply the area over which we observe the lensed images 
divided by the area inside the caustic producing multiple images because 
the magnification is the Jacobean relating area on the image and source 
planes, d 2 (3 = |/i| -1 gP0. For example, an SIS lens with Einstein radius b 
produces multiple images over a region of radius b on the source plane (i.e. 
the cross section is 7r6 2 ), and these images are observed over a region of 
radius 2b on the image plane, so the mean multiple-image magnification is 
(H) = (47r& 2 )/(7r6 2 ) = 4. 

Since fainter sources are almost always more numerous than brighter 
sources, magnification bias almost always increases your chances of find- 
ing a lens. The simplest example is to imagine a lens which always pro- 
duces the same magnification fj, applied to a population with number counts 
N(F) with flux F. The number counts of the lensed population are then 
Ni ens (F) = T/i -1 N(F/ //), so the fraction lensed objects (at flux F) is larger 
than the number expected from the optical depth if fainter objects are more 
numerous than the magnification times the density of brighter objects. Where 
did the extra factor of magnification come from? It has to be there to con- 
serve the total number of sources or equivalently the area on the source and 
lens planes - you can always check your expression for the magnification bias 
by computing the number counts of lenses and checking to make sure that 
the total number of lenses equals the total number of sources if the optical 
depth is unity. 

Real lenses do not produce unique magnifications, so it is necessary to 
work out the magnification probability distribution P(> (J,) (the probability 
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of a magnification larger than fj,) or its differential dP j c£/i and then convolve it 
with the source counts. Equivalently we can define a magnification dependent 
cross section, da/dfj, = adP/d^i where a is the total cross section. We can do 
this easily only for the SIS lens, where a source at (3 produces two images 
with a total magnification of /i = 2//3 with /i > 2 in the multiple image region 
ffians lRlTllB"^ . to find that P(> //) = (2//i) 2 and dP/dfi = 8//T 5 . The 
structure at low magnification depends on the lens model, but all sensible 
lens models have P(> /i) oc /i~ 2 at high magnification because this is generic 
to the statistics of fold caustics (Part 1, Blandford & Narayan 1986 ). 

Usually people have defined a magnification bias factor B(F) for sources 
of flux F so that the probability p(F) of finding a lens with flux F is related 
to the optical depth by p(F) = tB(F). The magnification bias factor is 

B(F) = N(F)- 1 [ ( -) (B.115) 

J [i dfx \fij 

for a source with flux F, or 

f dP 

B{m) = Nim)- 1 dfj, — TV (m + 2.5 log p.) (B.116) 
J dfi 

for a source of magnitude m. Note the vanishing of the extra l/fj, factor 
when using logarithmic number counts N(m) for the sources rather than 
the flux counts N(F). Most standard models have magnification probability 
distributions similar to the SIS model, with P(> fi) — (/x //z) 2 for \x > hq, in 
which case the magnification bias factor for sources with power law number 
counts N(F) = dN/dF oc F~ a is 

B(F) = (B.117) 
S — a 

provided the number counts are sufficiently shallow (a < 3). For number 
counts as a function of magnitude N(m) = dN/dra cx 10 a ™ 1 (where a = 
0.4(a — 1)) the bias factor is 

9, ,2.5a 

B(F> . ^- 2 . (B.118) 

The steeper the number counts and the brighter the source is relative to any 
break between a steep slope and a shallow slope, the greater the magnification 
bias. For radio sources a simple power law model suffices, with a ~ 2.07±0.11 
for the CLASS survey (Rusin & Tegmark 2001), leading to a magnification 
bias factor of B ~ 5. For quasars, however, the bright quasars have number 
counts steeper than this critical slope, so the location of the break from the 
steep slope of the bright quasars to the shallower slope for fainter quasars near 
B ~ 19 mag is critical to determining the magnification bias. Fig. lB.45l shows 
an example of a typical quasar number counts distribution as compared to 



B Strong Gravitational Lensing 113 



several (old) models for the distribution of lensed quasars. The changes in the 
magnification bias with magnitude are visible as the varying ratio between 
the lensed and unlensed counts, with a much smaller ratio for bright quasars 
(high magnification bias) than for faint quasars (low magnification bias) and 
a smooth shift between the two limits as you approach the break in the slope 
of the counts at B ~ 19 mag. 

For optically-selected lenses, magnification bias is "undone" by extinction 
in the lens galaxy because extinction provides an effect that makes lensed 
quasars dimmer than their unlensed counterparts. Since the quasar samples 
were typically selected at blue wavelengths, the rest wavelength correspond- 
ing to the quasar selection band at the redshift of the lens galaxy where it 
encounters the dust is similar to the U-band. If we use a standard color excess 
E(B — V) for the amount of dust, then the images become fainter by of order 
AjjE(B — V) magnitudes where Ajj ~ 4.9. Thus, if lenses had an average 
extinction of only E(B — V) ~ 0.05 mag, the net magnification of the lensed 
images would be reduced by about 25%. If all lenses had the same demagnifi- 
cation factor / < 1 then the modifications to the magnification bias would be 
straight forward. For power-law number counts N(F) cx F~ a , the magnifica- 
tion bias is reduced by the factor f a and a E(B — V) = 0.05 extinction leads 
to a 50% reduction in the magnification bias for objects with a slope a ~ 2 
(faint quasars) and to still larger reductions for bright quasars. Some exam- 
ples of the changes with the addition of a simple mean extinction are shown 
in the right panel of Fig. IB. 451 although the levels of extinction shown there 
are larger than observed in typical lenses as we discuss in EIB.9.11 Compar- 
isons between the statistics of optically-selected and radio-selected samples 
can be used to estimate the magnitude of the correction. The only such com- 
parison found estimated extinctions consistent with the direct measurements 
of jflQTTl (Falco, Kochanek & Munoz lTMgJl . However, the ISM of real lenses 
is presumably far more complicated, with a distribution of extinctions and 
different extinctions for different images which may be a function of orienta- 
tion and impact parameter relative to the lens galaxy, for which we have no 
good theoretical model. 

The flux of the lens galaxy also can modify the magnification bias for 
faint quasars, although the actual sense of the effect is complex. The left 
panel Fig. IB. 451 shows the effect of dropping lenses in which the lens galaxy 
represents some fraction of the total flux of the lensed images. The correction 
is unimportant for bright quasars because lens galaxies with B < 19 mag 
are rare. In this picture, the flux from the lens galaxy leads to the loss of 
lenses because the added flux from the lens galaxy makes the colors of faint 
lensed quasars differ from those of unlensed quasars so they are never se- 
lected as quasars to begin with. Alternatively, if one need not worry about 
color contamination, then the lens galaxy increases the magnification bias by 
supplying extra flux that makes lensed quasars brighter. 
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Any other selection effect, such as the dynamic range allowed for flux 
ratios between images as a function of their separation will also have an 
effect on the magnification bias. Exactly how the effect enters depends on 
the particular class of images being considered. For example, in the SIS 
lens (or more generally for two- image lenses), a limitation on the detectable 
flux ratio < fmin < 1 sets a minimum detectable magnification fi m i n = 
2(1 + /min)/(l — fmin) > Mo = 2. Since most lens samples have significant 
magnification bias, which means that most lenses are significantly magnified, 
such flux limits have only modest effects. The other limit, which cannot be 
captured in the SIS model, is that almost all bright images are merging pairs 
on folds (or triplets on cusps) so the image separation decreases as the mag- 
nification increases. The contrast between the merging images and any other 
images also increases with increasing magnification - combined with limits 
on the detectability of images, these lead to selection effects against highly 
magnified images. This is also usually a modest effect - while magnification 
bias is important, the statistics are dominated by modestly magnified systems 
rather than very highly magnified images. In fact, there are have been few 
attempts at complete studies of the complicated interactions between find- 
ing quasars, finding lenses, selection effects and magnification bias. There is 
an early general study by Kochanek l )1991bl) and a detailed practical appli- 
cation of many of these issues to the SDSS survey by Pindor et al. |2003). 
Unfortunately, Pindor et al. H2l)U3[) seem to arrive at a completeness estimate 
from their selection model that is too high given the number of lenses they 
found in practice. Some of this may be due to underestimating the luminosity 
of lens galaxies, the effects of the lens galaxy or extinction on the selection 
of quasars or the treatment of extended, multicomponcnt lenses compared 
to normal quasars in the photometric pipeline. These difficulties, as well as 
the larger size of the present radio-selected lens samples, are the reason that 
almost all recent statistical studies have focused exclusively on radio lenses. 

The standard magnification bias expressions fEans. lB~TT5l and lB.116|l are 
not always appropriate. They are correct for the statistics of lenses selected 
from source populations for which the total flux of the source (including all 
images of a lensed source) is defining F (or m) . This is true of most existing 
surveys - for example the CLASS radio survey sources were originally selected 
from single dish observations with very poor resolution compared to typical 
image separations (see Browne et al. l2()()3|) . If, however, the separation of the 
images is large compared to the resolution of the observations and the fluxes 
of the images are considered separately, then the bias must be computed 
in terms of the bright image used to select sources to search for additional 
images. This typically reduces the bias. More subtle effects can also appear. 
For example, the SDSS survey selects quasar candidates based on the best 
fit point-source magnitudes, which will tend to be an underestimate of the 
flux of a resolved lens. Hence the magnification bias for lenses found in the 
SDSS survey will be less than in the standard theory. Samples selected based 
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Fig. B.45. Examples of selection effects on optically selected lens samples. The 
heavy solid curves in the two panels shows a model for the magnitude distribution 
of optically-selected quasars. The light curves labeled Qu = 1 and Ao = 1 show the 
distribution of lensed quasars for flat cosmologies that are either pure matter or pure 
cosmological constant. The change in the ratio between the lensed curves and the 
unlensed curves illustrates the higher magnification bias for bright quasars where 
the number count distribution is steeper than for faint quasars. In the left panel the 
truncated curves show the effect of losing the lensed systems where the lens galaxy 
is Am — 1, 2 or 3 magnitudes fainter than the quasars. Once surveys are searching 
for lensed quasars with B <; 20 mag, the light from the lens galaxy becomes an 
increasing problem, particularly since the systems with the brightest lens galaxies 
will also have the largest image separations that would otherwise make them easily 
detected. In the left panel we illustrate the effect of adding a net extinction of 
Ab = 1 or 2 mag from dust in the lens galaxies. These correspond to larger than 
expected color excesses of E(B — V) ~ 0.2 and 0.4 mag respectively. Note how the 
extinction "undoes" the magnification bias by shifting the lensed distributions to 
fainter magnitudes. 



on more than one frequency can have more complicated magnification biases 
depending on the structure of the multidimensional number counts (Borgcest, 
von Linde & Refsdal lTMTl Wyithe, Winn & Rusin l2UU5|l . The exact behavior 
is complex, but the magnification bias can be tremendously increased if the 
fluxes in the bands are completely uncorrelated or tightly but nonlinearly 
correlated. For example, if the luminosities in bands A and B are related by 

1/2 

tight, nonlinear correlation of the form La oc L b , then the lensed examples 
of these objects will lie off the correlation. At present, there are too few deep, 
wide-area multiwavelength catalogs to make good use of this idea, but this 
is changing rapidly. 

In general, the ellipticity of the lenses has little effect on the expected 
number of lenses, allowing the use of circular lens models for statistical stud- 
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Fig. B.46. Magnification contours on the image (left) and source (right) planes 
for an SIS in an external shear. The heavy solid contours show the tangential 
critical line (left) and its corresponding caustic (right). On the image plane (left), 
the light curves are magnification contours. These are positive outside the critical 
curve and negative inside the critical curve. The images found in a four-image lens 
are all found in the region between the two dashed contours - when two images are 
merging on the critical line, the other two images lie on these curves. On the source 
plane the solid (dashed) curves show the projections of the positive (negative) 
magnification contours onto the source plane. Note that the high magnification 
regions are dominated by the four-image systems with the exception of the small 
high magnification regions found just outside the tip of each cusp. 

ies that are uninterested in the morphologies of the images (e.g. Keeton, 
Kochanek & Seliak lTMTl Rusin & Teemark lMTTl Chae l3UU5|> . However, the 
effects of ellipticity are trivially observable in the relative numbers of two- 
image and four-image lenses. We noted earlier that the expectation from the 
cross section is that four-image lenses should represent order e% ~ j 2 ~ 0.01 
of lenses where is the ellipticity of the lens potential. Yet in ffl3.2l we saw 
that four-image lenses represent roughly one third of the observed popula- 
tion. The high abundance of four-image lenses is a consequence of the different 
magnification biases of the two-image multiplicities - the four-image lenses 
are more highly magnified than the two-image lenses so they have a larger 
magnification bias factor. 

Fig. IB. 461 shows the image magnification contours for an SIS lens in an 
external shear on both the image and source planes. The highly magnified 
regions are confined to lie near the critical line. If we Taylor expand the inverse 
magnification radially, then /i -1 = Z\x|<i/z -1 / dx\ where Ax is the distance 
from the critical line, so the magnification drops inversely with the distance 
from the critical line. If we Taylor expand the lens equations, then we find 
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Fig. B.47. The integral magnification probability distributions for a singular 
isothermal ellipsoid with an axis ratio of q — 0.7 normalized by the total cross 
section for finding two images. Note that the total four-image cross section is only 
of order e% ~ (e/3) 2 ~ 0.01 of the total, but that the minimum magnification for 
the four-image systems (fi m in ~ 1/e ~ 10) is much larger than that for the two- 
image systems (fJmin ~ 2 just as for an SIS). The entire four-image probability 
distribution is well approximated by the P(> fJ,) oc fi~ 2 power law expected for 
fold caustics, while the two-image probability distribution is steeper since highly 
magnified images can only be created by the cusps. Figure courtesy of D. Rusin. 




Axial ratio f 



Fig. B.48. The expected number of two-image, four-image and three-image (disk 
or cusp) lenses as a function of axis ratio / for the CLASS sample. From Rusin & 
Tegmark fHMl . 
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that the change in source plane coordinates is related to the change in image 
plane coordinates by A(3 = /i -1 Ax oc /i~ 2 . Thus, if L is the length of the 
astroid curve, the probability of a magnification larger than fj, scales as P(> 
fi) oc pr % Lj\dii~ 1 jdx\. This applies only to the four image region, because the 
only way to get a high magnification in the two- image region is for the source 
to lie just outside the tip of a cusp. The algebra is overly complex to present, 
but the generic result is that the region producing magnification /x extends 
/i~ 2 from the cusp tip but has a width that scales as /i -1 / 2 , leading to an 
overall scaling that the asymptotic cross section declines as P(> fi) oc fi~' 7 1 2 
rather than P(> /i) oc fi~ 2 . This can all be done formally (see Blandford 
& Naravan 11986(1 so that asymptotic cross sections can be derived for any 
model (e.g. Kochanek & Blandford ll987l Finch et al. 12002(1 . but a reasonable 
approximation for the four-image region is to compute the magnification, fiQ , 
for the cruciform lens formed when the source is directly behind the lens and 
then use the estimate that P(> /x) = ([i /fi) 2 . Unfortunately, such simple 
estimates are not feasible for the two-image region. These distributions are 
relatively easy to compute numerically, as in the example shown in Fig. IB. 471 

Because the minimum magnification of a four-image lens increases /io oc 
7 _1 even as the cross section decreases as 04 oc j 2 , the expected number 
of four-image lenses in a sample varies much more slowly with ellipticity 
than expected from the cross section. The product a^B{F) oc 7 2 /ig _1 , of 
the four-image cross section, 04, and the magnification bias, B(F), scales 
as 7 3 ~ Q oc 7 for the CLASS survey (a ~ 2), which is a much more gentle 
dependence on ellipticity than the quadratic variation expected from the 
cross section. There is a limit, however, to the fraction of four-image lenses. 
If the potential becomes too flat, the astroid caustic extends outside the radial 
caustic (Fig. IB. 18(1 , to produce three- image systems in the "disk" geometry 
rather than additional four- image lenses. In the limit that the axis ratio 
goes to zero (the lens becomes a line), only the disk geometry is produced. 
The existence of a maximum four-image lens fraction, and its location at an 
axis ratio inconsistent with the observed axis ratios of the dominant early- 
type lenses has made it difficult to explain the observed fraction of four 
image lenses (King & Browne 1996, Kochanek 1996b, Keeton, Kochanek & 
Seliak IH)9T1 Keeton & Kochanek HMSl Rusin & Teemark l2fJ0T|l ■ Recently, 
Cohn & Kochanek ( 2001) argued that satellite galaxies of the lenses provide 
the explanation by somewhat boosting the fraction of four-image lenses while 
at the same time explaining the existence of the more complex lenses like 
B1359+154 (Myers et al. HMl Rusin et al. l2TjHT|l and PMNJ0134-0931 (Winn 
et al. 2002c, Keeton & Winn 2003) formed by having multiple lens galaxies 
with more complex caustic structures. It is not, however, clear in the existing 
data that four-image systems are more likely to have satellites to the lens 
galaxy than two-image systems as one would expect for this explanation. 

Gravitational lenses can produce highly magnified images without multi- 
ple images only if they are highly elliptical or have a low central density. 



B Strong Gravitational Lensing 119 



The SIS lens has a single-image magnification probability distribution of 
rdP/dp = 2irb 2 /(/i — l) 3 with p < 2 compared to rdP/dp = 2irb 2 /p 3 with 
/i > 2 for the multiply imaged region, so single images are never magnified 
by more than a factor of 2. For galaxies, where we always expect high central 
densities, the only way to get highly magnified single images is when the 
astroid caustic extends outside the radial caustic (Fig, IB. 18)) . A source just 
outside an exposed cusp tip can be highly magnified with a magnification 
probability distribution dP j dpi oc /i~ 7 / 2 . Such single image magnifications 
have recently been a concern for the luminosity function of high redshift 
quasars (e.g. WyitheElSI Keeton, Kuhlen & Haiman f2~fM)> and will be the 
high magnification tail of any magnification perturbations to supernova fluxes 
(e.g. Dalai et al. [5003). As a general rule for galaxies, the probability of a 
single image being magnified by more than a factor of two is comparable to 
the probability of being multiply imaged. 

B.6.7 Cosmology With Lens Statistics 

The statistics of lenses, in the sense of the number of lenses expected in a sam- 
ple of sources as a function of cosmology, is a volume test of the cosmological 
model because the optical depth (at least for flat cosmologies) is proportional 
to D\. However, the number of lenses also depends on the comoving density 
and mass of the lenses (n*, er* and a in the simple SIS model). While n* 
could plausibly be estimated locally, the a\ dependence on the mass scale 
makes it very difficult to use local estimates of galaxy kinematics or masses 
to normalize the optical depth. The key step to eliminating this problem is 
to note that there is an intimate relation between the cross section, the ob- 
served image separations and the mass scale. While this will hold for any mass 
model, the SIS model is the only simple analytic example. The mean image 
separation for the lenses should be independent of the cosmological model 
for flat cosmologies (and only weakly dependent on it otherwise). Thus, in 
any lens sample you can eliminate the dependence on the mass scale by re- 
placing it with the observed mean image separation, tsis oc n sr {A9) 2 D 3 . Full 
calculations must include corrections for angular selection effects. Most odd 
results in lens cosmology arise in calculations that ignore the close coupling 
between the image separations and the cross section. 

In practice, real calculations are based on variations of the maximum 
likelihood method introduced by Kochanek l|1993bl HffflfiaJ) . For each lens i 
you compute the probability pi that it is lensed including magnification bias 
and selection effects. The likelihood of the observations is then 

hxL = ln ^+ Yl hi -Pi)- kPi- Y Pi ( R119 ) 

lenses unlensed lenses unlensed 

where ln(l— pi) ~ — pi provided pi <C 1. This simply encodes the likelihood of 
finding the observed number of lenses given the individual probabilities that 
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the objects are lensed. Without further information, this likelihood could 
determine the limits on the cosmological model only to the extent we had 
accurate prior estimates for n* and <r* . 

If we add, however, a term for the probability that each detected lens has 
its observed separation fEan. lB~l 121 plus any selection effects) 



then the lens sample itself can normalize the typical mass scale of the lenses 
(Kochanek 1993b). This has two advantages. First, it eliminates any system- 
atic problems arising from the dynamical normalization of the lens model and 
its relation to the luminosity function. Second, it forces the cosmological es- 
timates from the lenses to be consistent with the observed image separations 
- it makes no sense to produce cosmological limits that imply image separa- 
tions inconsistent with the observations. In theory the precision exceeds that 
of any local calibration very rapidly. The fractional spread of the separations 
about the mean is ~ 0.7, so the fractional uncertainty in the mean separation 
scales as 0.7/iV 1 / 2 for a sample of N lenses. Since the cross section goes as 
the square of the mean separation, the uncertainty in the mean cross section 
IA/N 1 / 2 exceeds any plausible accuracy of a local normalization for c* (10% 
in cr*, or 20% in (9) oc c 2 , or 40% in r a er^) with only N ~ 10 lenses. 

Any other measurable property of the lenses can be added to the like- 
lihood, but the only other term that has been seriously investigated is the 
probability of the observed lens redshift given the image separations and the 
source redshift fKochanek lTMSal I1996a1 Helbig & Kavser lTM)l Ofek, Rix & 
Maoz 2003- I n general, cosmologies with a large cosmological constant pre- 
dict significantly higher lens redshifts than those without, and in theory this 
is a very powerful test because of the exponential cutoff in Eqn. IB. 1141 The 
biggest problem in actually using the redshift test, in fact so big that it prob- 
ably cannot be used at present, is the high incompleteness of the lens redshift 
measurements There will be a general tendency, even at fixed separa- 

tion, for the redshifts of the higher redshift lens galaxies to be the ones that 
are unmeasured. Complete samples could be defined for a separation range, 
usually by excluding small separation systems, but a complete analysis needs 
to include the effects of groups and cluster boosting image separations be- 
yond the splitting produced by an isolated galaxy. For example, how do we 
include Q0957+561 with its separation of 6'.' 2 that is largely due to the lens 
galaxy but has significant contributions from the surrounding cluster? 

B.6.8 The Current State 

Recent analyses of lens statistics have focused exclusively on the CLASS flat 
spectrum radio survey (Browne et al. 12003( 1. Chae et al. ©02), Chae 1(2003(1 
and Mitchell et al. ( 2001J focus on estimating the cosmological model and find 
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Mitchell et al. (2004)^ 
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Fig. B.49. (Top) Likelihood functions for the cosmological model from Mitchell 
et al. ( 2004 ) using the velocity function of galaxies measured from the SDSS sur- 
vey and a sample of 12 CLASS lenses. The contours show the 68, 90, 95 and 99% 
confidence intervals on the cosmological model. In the shaded regions the cosmo- 
logical distances either become imaginary or there is no big bang. (Bottom) The 
histogram shows the separation distribution of the 12 CLASS lenses used in the 
analysis and the curve shows the distribution predicted by the maximum likelihood 
model including selection effects. 
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results in general agreement with estimates from Type la supernovae (e.g. 
Riess et al. |2U0'41 . The general approach of both groups is to use variants 
of the maximum likelihood methods described above in flB.6.71 Chae ( 2003) 
uses an obsolete estimate of the galaxy luminosity function combined with 
a Faber- Jackson relation and the variable transformation of Eqn. IB. 10.31 but 
normalized the velocity scale using the observed distribution of lens separa- 
tions. Mitchell et al. 1)2004)1 use the true velocity dispersion function from the 
SDSS survey (Sheth et al. I2003j> and incorporate a Press-Schechter l)1974|l 
model for the evolution of the velocity function. Chae : 200Tfli used ellip- 
soidal galaxies, although this has little cosmological effect, while Mitchell 
et al. H2004fl considered only SIS models. Fig. IB. 491 shows the cosmological 
limits from Mitchell et al. (200 1 i. which are typical of the recent results . 
There are also attempts to use lens statistics to constrain dark energy (e.g. 
Chae et al.EIEHI Kuhlen, Keeton & MadauEESI), but far larger, well-defined 
samples are needed before the resulting constraints will become interesting. 

Chae & Mao (HEH, Davis, Huterer & Krauss ({27)113)1 and Ofek, Rix & 
Maoz (2003) focused on galaxy properties and evolution in a fixed, con- 
cordance cosmology rather than on determining the cosmological models. 
Mitchell et al. I|2004[) compared models where the lenses evolved following 
the predictions of CDM models in comparison to non-evolving models. Be- 
cause lens statistical estimates are unlikely to complete with other means of 
estimating the cosmological models, these are more promising applications of 
gravitational lens statistics for the future. Attempts to estimate the evolution 
of the lens population usually allow the rt* and <r* parameters of the veloc- 
ity function (Eqn. IB. 103)1 to evolve as power laws with redshift. Mitchell et 
al. i|2T)0 1 Fig. IB. 44)1 point out that CDM halo models make specific predic- 
tions for the evolution of the velocity function that have a different structure 
from simple power laws in redshift, but with the present data the differences 
are probably unimportant. All these evolution studies came to the conclu- 
sion that the number density of the a v ~ c* galaxies which dominate lens 
statistics has changed little (<, ±50%) between the present day and redshift 
unity. 

I have three concerns about these analyses and their focus on the "com- 
plete" CLASS lens samples. First, a basic problem with the CLASS survey 
is that we lack direct measurements of the redshift distribution of the source 
population forming the lenses (e.g Marlow et al. 2000 : , Muhoz et al. 2003). 
In particular, Muhoz et al. 1)2003)1 note that the radio source population is 
changing radically from nearly all quasars to mostly galaxies as you approach 
the fluxes of the CLASS source population. This makes it dangerous to ex- 
trapolate the source population redshifts from the brighter radio fluxes where 
the redshift samples are nearly complete to the fainter samples where they 
are not. The second problem is that no study has a satisfactory treatment 
of the lenses with satellites or associated with clusters. All the analyses use 
isolated lens models and then either include lenses with satellites but ignore 
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the satellites or drop lenses with satellites and ignore the fact that they have 
been dropped. The analysis by Cohn & Kochanek l|2004|l of lens statistics 
with satellites shows that neither approach is satisfactory - dropping the 
satellites biases the results to underestimate cross sections while including 
them does the reverse. Cohn & Kochanek (2004) concluded that including 
they systems with satellites probably has fewer biases than dropping them. A 
similar problem probably arises from the effects of the group halos to which 
many of the lenses belong (e.g. Keeton et al.|2000, Fassnacht & Lubin[2002). 
My third concern is that the separations of the radio lenses seen to be sys- 
tematically smaller than the optically selected lenses even though the optical 
HST Snapshot survey (Maoz et al. ^593 ) had the greatest sensitivity to small 
separation systems. It is possible that this is simply due to selection effects 
in the optical samples, but I have seen no convincing scenario for producing 
such a selection effect. We see no clear correlation of extinction with image 
separation (see 9B.9.1j) . emission from the lens galaxy is less important for 
small separation systems than for large separation systems, and the selection 
function due to the resolution of the observations is fairly simple to model. 

On the other hand, the various lens samples may all consistent. One way 
to compare the different data sets is to non-parametrically construct the ve- 
locity function from the observed image separations of the samples. To do 
this we assume an SIS lens model for the conversion from image separations 
to circular velocities, and then adopt the standard non-parametric methods 
used to construct luminosity functions from redshift surveys to construct the 
velocity function from the image separations (Kochanek 2003cJ . The results 
for the flat-spectrum lens surveys (CLASS, JVAS, PANELS), all radio sur- 
veys and all radio surveys plus the quasar lenses are shown in Fig. IB. 501 We 
normalized the estimates to the density at v c = 300 km/s to eliminate any 
dependence on the cosmological model. The lens data can estimate the veloc- 
ity function from roughly v c ~ 100 km/s to 500 km/s. At lower velocities the 
finite resolution of the observations makes the uncertainties in the density 
explode, and at higher velocities the surveys have not searched large enough 
angular regions around the lens galaxies. The shape of the velocity function 
is consistent with local estimates (Fig. IB.42|I except in the highest circular 
velocity bin where we begin to see the contribution from clusters we will con- 
sider in flB.7l Fig. IB. 5(11 also makes it clear why constraints on the evolution of 
the lenses are so weak - evolution estimates basically try to compare the low- 
redshift separation distribution to the high redshift separation distribution, 
and we simply do not have large enough lens samples to begin subdividing 
them in redshift (to say nothing of dealing with unmeasured redshifts) and 
still have small statistical uncertainties. 
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Fig. B.50. Non-parametric reconstructions of the velocity function from the ob- 
served separations of gravitational lenses assuming an SIS lens model. The velocity 
functions are all normalized to the bin centered at 300 km/s. The filled squares use 
only the lenses in the flat spectrum radio surveys, the triangles use all radio-selected 
lenses and the pentagons include all radio lenses and all quasar lenses. The horizon- 
tal error bars on the filled squares show the bin widths. The triangles and pentagons 
are horizontally offset from the squares to make them more visible. The curves show 
the velocity function estimated from the 2MASS sample from Fig. IB. 42! The hori- 
zontal scale at the top of the figure shows the maximum separation produced by a 
lens of the corresponding circular velocity. The mean separation produced by such 
a lens will be one-half the maximum. 



B.7 What Happened to The Cluster Lenses? 

One would think from the number of conference proceeding covers featuring 
HST images of cluster arcs that these are by far the most common type of lens. 
In fact, this is an optical delusion created by the ease of finding the rich clus- 
ters even though they are exponentially rare. The most common kind of lens 
is the one produced by a typical massive galaxy - as we saw in in Fig. lB~50l 
For a comparison, Fig. IB. 511 shows several estimates of the velocity function 
based on standard CDM mass functions and halo models (from Kochanek 
& White I3UHT1 and Kochanek IStMcl using the Sheth & Tormen 115551 mass 
function combined with the NFW halo model from ilB.4.1|) . We see for high 
masses or circular velocities that the predicted distribution of halos agrees 
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Fig. B.51. The expected circular velocity function dn/dlogv c of CDM halos The 
lowest dashed curve labeled NFW v V i r shows the velocity function using the NFW 
halo virial velocity tw for the circular velocity (see EIB.4.1II . The middle dashed 
curve labeled NFW « c ,max shows the velocity function if the peak circular velocity 
of the halo is used rather than the virial velocity. The upper dashed curve is a 
model in which the baryons of halos with M ^ 1O 13 M0 cool, raising the central 
density and circular velocity. The solid curve with the points shows the estimate of 
the local velocity function of galaxies (Fig. IB. 421 and the solid curve extending to 
higher velocities is an estimate of the local velocity function of groups and clusters. 



with the observed distribution of clusters. At the velocities typical of galaxies, 
the observed density of galaxies is nearly an order of magnitude higher than 
expected for a CDM halo mass function. At very low velocities we expect 
many more halos than we observe galaxies. The velocity function estimated 
from the observed image separations matches that of galaxies with the begin- 
nings of a tail extending onto the distribution of clusters at the high velocity 
end (Fig. IB.5l|) . At low velocities the limited resolution of the present sur- 
veys means that the current lens data does not probe the low velocity end 
very well. In this section we discuss the difference between cluster and galaxy 
lenses and explain the origin of the break between galaxies and clusters. In 
flB.81 on CDM substructure we will discuss the divergence at low circular 
velocities. 
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Fig. B.52. Predicted image separation distributions assuming the structure of 
halos does not change with halo mass. The heavy solid line shows the prediction 
for pure NFW models while the light solid (dashed) curves shows the predictions 
after 5% of the baryons have cooled into a disk (a disk plus a bulge with 10% of 
the baryonic mass in the bulge). The curves labeled CLASS (for the CLASS survey 
lenses) and all radio (for all radio selected lenses) show the observed distributions. 



The standard halo mass function is roughly a power law with dn/dM ~ 
M -1 ' 8 combined with an exponential cutoff at the mass scale corresponding 
to the largest clusters that could have formed at any epoch (e.g. the Shcth 
& Tormen ll999l halo mass function). Typically these rich clusters have inter- 
nal velocity dispersions above 1000 km/s and can produce image splittings of 
~ 30 arcsec. If halo structure was independent of mass, then we would expect 
the separation distribution of gravitational lenses to show a similar structure 
- a power law out to the mass scale of rich clusters followed by an exponen- 
tial cutoff. In Fig. EH we compare the observed distribution of radio lenses 
to that expected from the halo mass function assuming either NFW halos 
or NFW halos in which the baryons, representing 5% of the halo mass, have 
cooled and condensed into the centers of the halos (Kochanek & White 2001 ). 
We would find similar curves if we used simple SIS models rather than these 
more complex CDM-based models (Keeton 1998, Porciani & Madau 20(211). I n 
practice, the most complete survey for multiply imaged sources, the CLASS 
survey, found a largest separation of 4'.' 5 (B2108+213) despite checking candi- 
dates out to separations of 15'.'0 (Phillips et al. 12001(1 . The largest lens found 
in a search for multiply imaged sources has an image separation of roughly 
15 arcsec (SDSS1004+4112, Inada et al. l20U5|) . The overall separation distri- 
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Fig. B.53. (Top) The rotation curve and (Bottom) the bending angle a(x) for a 
10 12 Mq halo at zi = 0.5 with a concentration of c = 8 lensing a source at z s = 2.0. 
The dashed curves show the results for the initial NFW halo, while the solid curves 
show the results after allowing 5% of the mass to cool conserving angular momentum 
(spin parameter A — 0.04) and adiabatically compressing the dark matter. The three 
solid curves show the effect of putting 0%, 10% or 20% of the baryonic mass into 
a central bulge. Higher bulge masses raise the central circular velocity and steepen 
the central deflection profile. The final disk scale length is td- Compare these to the 
bending angles of our simple models in Figs. IB.10llB.14l 
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bution (see Fig. IB. 52(1 has a sharp cutoff on scales of 3 arcsec corresponding 
to galaxies with velocity dispersions of ~ 250 km/s. The principal searches 
for wide separation lenses are Maoz et al. (1997), Ofek et al. {2001) and 
Phillips et al. (2001), although most surveys searched for image separations 
of at least 6'.'0. A large number of studies focused only on the properties 
of lenses produced by CDM mass functions (e.g. Narayan & White [1988 
Wambsganss et al. Koch anek fl995bl Maoz et al. IT557I Flores k 

Primack 1996] Mortlock & Webster I2000bl Li & Ostriker I271H21 Keeton & 
Madau 121)011 Wyithe, Turner & Snergel l2"001[) . We will not discuss these in 
detail because such models cannot reproduce the observed separation dis- 
tributions of lenses. Most recent analyses allow for changes in the density 
distributions between galaxies and clusters. 

Physically the important difference between galaxies and clusters is that 
the baryons in the galaxies have cooled and condensed into the center of the 
halo to form the visible galaxy. As the baryons cool, they also drag some of 
the dark matter inward through a process known as adiabatic compression 
(Blumenthal et al. 1986), although this is less important than the cooling. As 
we show in Fig. IB. 531 standard dark matter halos are terrible lenses because 
their central cusps (p oc r~ 7 and 1.5 > 7 > 1) are too shallow. In this case, 
a standard NFW halo with a total mass of 1O 12 M and a concentration of 
c = 8 (see Eqns. IB.60llB~62l at a redshift of zi = 0.5 is unable to produce 
multiple images of a source at redshift z s — 2 despite having an asymptotic 
circular velocity of nearly 200 km/s. If we now assume that 5% of the mass is 
in baryons starting with a typical halo angular momentum and then cooling 
into a disk of radius while conserving angular momentum we see that the 
rotation curve becomes flatter and the galaxy is now able to produce multiple 
images. Putting some fraction of the mass into a still more compact, central 
bulge make the lens even more supercritical and the bending angle diagram 
begins to resemble that of an SIS lens (see Fig. IB.llj) . Thus, the cooling 
of the baryons converts a sub-critical dark matter halo into one capable of 
producing multiple images. 

The key point is that only intermediate mass halos contain baryons which 
have cooled. High mass halos (groups and clusters) have cooling times longer 
than the Hubble time so they have not had time too cool (e.g. Rees & Os- 
triker [r!?7T| . Most low mass halos also probably resemble dark matter halos 
more than galaxies with large quantities of cold baryons because they lost 
their baryons due to heating from the UV background during the initial 
period of star formation (e.g. Klypin et al. 1999 Bullock, Kravtsov & Wein- 
berg 2000, see ^B.8|l . Here we ignore the very low mass halos and consider 
only the distinction between galaxies and groups/clusters. The fundamen- 
tal realization in recent studies (e.g. Porciani & Madau 2000, Kochanek & 
White lMHI Kuhlen, Keeton & Madau HSQH Li & OstrikerEDnil is that intro- 
ducing a cooling mass scale M c below which the baryons cool to form galaxies 
and above which they do not supplies the explanation for the difference be- 
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Fig. B.54. (Top) Predicted separation distributions as a function of the cooling 
mass scale M c in which 5% of the mass cools with 90% of the cooled material in 
a disk and 10% in a bulge. The dashed curves show the distributions for M c = 
1O 12 M , 3 x 1O 12 M and 10 13 M Q , while the solid curves show the distributions 
for M c = 3 x 10 13 M Q , 10 14 M q and 3 x 1O 14 M . The heavy solid (dashed) curves 
shows the observed distribution of the CLASS (all radio-selected) lenses. 

Fig. B.55. (Bottom) The Kolmogorov-Smirnov probability, Pks, of fitting the ob- 
served distribution of CLASS lenses as a function of the cooling mass scale M c . 
The heavy solid curves show the results when 5% of the mass cools without (with) 
10% of that mass in a bulge. The heavy dashed curves show the results for models 
where lower (1% and 2%) or higher (10% and 20%) halo mass fractions cool, where 
the optimal cooling mass scale M c decreases as the cold baryon fraction increases. 
For comparison, the light dashed line shows the cooling time t coo i in units of 10 Gyr 
for the radius enclosing 50% of the baryonic mass in the standard model. The light 
solid line shows the average formation epoch, (tj orm ), also in units of 10 Gyr. 
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tween the observed separation distribution of lenses and naive estimates from 
halo mass functions. 

Once we recognize the necessity of introducing a distinction between clus- 
ter and galaxy mass halos, we can use the observed distribution of lens sep- 
arations to constrain the mass scale of the break and the physics of cooling. 
Fig. IB. 54l shows the most common version of these studies, where separation 
distributions are computed as a function of the cooling mass scale M c . We 
show the separation distributions for various cooling mass scales assuming 
that 5% of the mass cools into a disk plus a bulge with 10% of the bary- 
onic mass in the bulge for all halos with M < M c . If the cooling mass is 
either too low or too high we return to the models of Fig. IB. 521 while at 
some intermediate mass scale we get the break in the separation distribu- 
tion to match the observed angular scale. For these parameters, the optimal 
cooling mass scale is M c ~ 10 13 M Q (Tig. IB.55|l . This agrees reasonably well 
with Porciani & Madau {2000) and Kuhlcn, Keeton & Madau (j2()t)4j) who 
found a somewhat higher mass scale M c ~ 3 x 10 13 M Q using SIS models for 
galaxies. Cosmological hydrodynamic simulations by Pearce et al. ( 1999) also 
found that approximately 50% of the baryons had cooled on mass scales near 
10 13 M Q . Note, however, that the mass scale needed to fit the data depends 
on the assumed fraction of the mass in cold baryons. With fewer cold baryons 
a halo becomes a less efficient lens producing smaller image separations so 
M c must increase to keep the break at the observed scale. If the cold baryon 
fraction is too low (<, 1%), it becomes impossible to explain the data at all. 
Crudely, the cooling mass scale depends exponentially on the cold baryon 
fraction with \ogM c /M Q ~ 13.6 — (cold fraction) /0. 15. 

The mass scale of the break and the cold baryon fraction are not indepen- 
dent parameters and should be derivable from the physics of the cooling gas. 
In its full details this must include not only the cooling of the gas but also 
reheating of the gas in galaxies due to feedback from star formation. Fig. IB.55l 
also shows the dependence of the cooling time scale and the formation time 
scale for halos of mass M c . For this model (based on the semi- analytic models 
of Cole et al. 200QJ), the cooling time becomes shorter than the age of the 
halo very close to the mass scale required to explain the distribution of im- 
age separations. These semi-analytic models suggest an alternate approach in 
where the cooling mass scale need not be added as an ad hoc parameter. We 
could instead follow the semi-analytic models and use the cooling function 
to determine the relative cooling rates of halos with different masses. We 
leave as the free parameter, the final cosmological density in cold baryons 
^b,cooi < fib — 0.04 (i.e. some baryons may never cool or cool and are re- 
heated by feedback). Low fib, cool models have difficulty cooling, making them 
equivalent to models with a high cooling mass scale. High fib,cooi models cool 
easily, making them equivalent to models with a high cooling mass scale. 
Models with 0.015 <, fib, CO oi & 0.025 agree with the observations. The re- 
sult depends little on whether we add a bulge, fit the CLASS sample or all 
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Fig. B.56. (Top) Predicted separation distributions as a function of the cos- 
mological cold baryon density flb.cooi- The dashed curves show the results for 
f2b,cooi ~ 0.003, 0.006 and 0.009 (right to left at large separation) and the solid 
curves show the results for n b>coo i = 0.0012, 0.015, 0.018, 0.021, 0.024, 0.030, 0.045 
and 0.060 (from left too right at large separation). The models have 10% of the cold 
baryons in a bulge. The heavy solid (dashed) curves show the observed distribution 
of CLASS (all radio) lenses. 

Fig. B.57. (Bottom) The Kolmogorov-Smirnov probability, Pks, of fitting the ob- 
served distribution of lenses as a function of the cold baryon density Qb, CO oi- The 
squares (triangles) indicate models with no bulge (10% of the cooled material in 
a bulge), and the solid (dashed) lines correspond to fitting the CLASS (all radio) 
lenses. For comparison, the horizontal error bar is the estimate by Fukugita, Hogan 
& Peebles 1199811 for the cold baryon (stars, remnants, cold gas) content of local 
galaxies. The vertical line marks the total baryon content of the concordance model. 
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radio lenses or adjust the cooling curve by a factor of two. Thus, the char- 
acteristic scale of the gravitational lens separation distribution is a probe of 
the cosmological baryon density J7f, and the fraction of those baryons that 
cool in the typical massive galaxy. While it would be premature to use this 
as a method for determining fif,, it is interesting to note that our estimate 
is significantly below current cosmological estimates that fib ~ 0.04 which 
would be consistent with feedback from star formation and other processes 
preventing all baryons from cooling, but well above the estimates of the cold 
baryon fraction in local galaxies (0.0045 < f2b,cooi Ss 0.0068, Fukugita, Hogan 
& Peebles T1998II . These are also the models generating the velocity function 
estimate with baryonic cooling in Fig. IB.5i1 The cooling of the baryons shifts 
the more numerous low velocity halos to higher circular velocities so that 
the models match the observed density of a v ~ a* galaxies. The models do 
not correctly treat the break region because they allow "over-cooled" massive 
groups, but then merge back onto the peak circular velocity distribution of 
the CDM halos at higher velocities. Since the models allow all low mass halos 
to cool, there is still a divergence at low circular velocities which is closely 
related to the problem of CDM substructure we discuss in 

B.7.1 The Effects of Halo Structure and the Power Spectrum 

Estimating the structure of clusters using gravitational lensing is primarily a 
topic for Part 3, so we include only an abbreviated discussion of lensing by 
clusters here. For a fixed cosmological model, two parameters largely control 
the abundance of cluster lenses. First, the abundance of clusters varies nearly 
exponentially with the standard normalization og — 1 of the power spectrum 
on 8/i _1 Mpc scales. Second, the cross sections of the individual clusters 
depend strongly on the exponent of the central density cusp of the cluster. 
There are recent studies of these issues by Li & Ostriker (^21)021 12003 ) . Huterer 
& Ma l|2lMjl . Kuhlen, Keeton & Madau (|2UU1|) . Oguri et al. PMjl . and Oguri 
& Keeton (j2"0Tj4) . 

We can understand the general effects of halo structure very easily from 
our simple power law model in Ean. IB.91 In 3B.3I we normalized the models 
to have the same Einstein radius, but we now want to normalize them to 
all have the same total mass interior to some much larger radius i?o- This is 
roughly what happens when we keep the virial mass and break radius of the 
halo constant but vary the central density exponent p oc r~ n . The deflection 
profile becomes 



where bo <C Rq sets the mass interior to Rq and we recover our old example 
if we let b = bo = Rq- The typical image separation is determined by the 
tangential critical line at 9 t = Ro(bo/Ro) 2 ^ n ^ 1 , so more centrally concen- 
trated lenses (larger n) produce larger image separations when 6o/i?o "C 1. 




(B.121) 
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The radial caustic lies at (3 r = f(n)8t where f(n) is a not very interesting 
function of the index n, so the cross section for multiple imaging a oc 0%. oc 
i?o(foo/-Ro) 4 ^™~^ ~ for an SIS profile a oc b A / 'iJ§, while the cross section for 
a Moore profile (n — 3/2) a oc 6 8 /16i?Q is significantly smaller. We cannot go 
to the limit of an NFW profile (n = 1) because our power law model has a 
constant surface density rather than a logarithmically divergent surface den- 
sity in the limit asn-> 1, but we can see that as the density profile becomes 
shallower the multiple image cross section drops rapidly when the models 
have constant mass inside a radius which is much larger than their Einstein 
radius. As a result, the numbers of group or cluster lenses depends strongly 
on the central exponent of the density distribution even when the mass func- 
tion of halos is fixed. Magnification bias will weaken the dependence on the 
density slope because the models with shallower slopes and smaller cross sec- 
tions will generally have higher average magnifications. The one caveat to 
these calculations is that many groups or clusters will have central galaxies, 
and the higher surface density of the galaxy can make the central density 
profile effectively steeper than the CDM halo in isolation. 

B.7.2 Binary Quasars 

Weedman et al. (1982) reported the discovery of the third "gravitational 
lens", Q2345+007, a pair of z = 2.15 quasars separated by 7'.'3. The optical 
spectra of the two images are impressively similar (e.g. Small et al. 1997), 
but repeated attempts to find a lens have failed in both the optical (e.g. Pcllo 
et al.EHii and with X-rays (Green et al. 127702) . Q2345+007 is the founding 
member of a class of objects seen in the optical as a pair of quasars with very 
similar spectra, small velocity differences and separations 3'.'0 ^ A9 <, 15'.'0. 
The most recent compilation contained 15 examples (Mortlock, Webster & 
Francis 1999). The incidence of these quasar pairs in surveys is roughly 2 per 
1000 LBQS quasars (see Hewett et al. HTMj) and 1 per 14000 CLASS radio 
sources (Koopmans et al. I2000fl . The separations of these objects correspond 
to either very massive galaxies or groups/clusters. Obvious lenses on these 
scales, in the sense that we see the lens, are rare but have an incidence 
consistent with theoretical expectations (see Fig. IB.521 . If, however, even a 
small fraction of the objects like Q2345-007 are actually gravitational lenses, 
then dark lenses outnumber normal groups and clusters and dominate the 
halo population on mass scales above M <; 10 13 M Q . 

If the criterion of possessing a visible lens is dropped, so as to allow for 
dark lenses, proving objects are lenses becomes difficult. There are two un- 
ambiguous tests - measuring a time delay between the images, which is very 
difficult given the the long time delays expected for lenses with such large sep- 
arations, or using deep imaging to find that the host galaxies of the quasars 
show the characteristic arcs or Einstein rings of lensed hosts (Figs. IB.3I IB.4|l . 
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The latter test is feasible with HST 7 and will be trivial with JWST. Spectral 
comparisons have been the main area of debate. In the optical, many of the 
pairs have alarmingly similar spectra if they are actually binary quasars (e.g. 
Q2345+007 or Q1634+267, see Small et al. HM7| - indeed, some of these dark 
lens candidates have more similar spectra than genuinely lensed quasars (see 
Mortlock, Webster & Francis ri999f) . The clearest examples of dark lens can- 
didates that have to be binary quasars are the cases in which only one quasar 
is radio loud. These objects, such as PKS1145-071 (Djorgovski et al. I1987[> 
or MGC2214+3550 (Munoz et al. AiVM. represent 4 of the 15 candidates. 
Similarly, the dramatic difference in the flux ratio between optical and X-ray 
wavelengths of Q2345+007 is the strongest direct argument for this object 
being a binary quasar (Green et al.[2002). 

Two statistical arguments provide the strongest evidence that these ob- 
jects must be binary quasars independent of any weighting of spectral simi- 
larities. The first argument, due to Kochanek, Muhoz & Falco (|1999|1 . is that 
the existence of binary quasars like MGC2214+3550 in which only one of the 
quasars is radio loud predicts the incidence of pairs in which both are radio 
quiet. We can label the quasar pairs as either 2 R 2 , where both quasars are 
seen in the optical (O) and the radio (R), 2 R, where only one quasar is seen 
in the radio, or O 2 where neither quasar is seen in the radio. Lenses must 
be either 2 R 2 or O 2 pairs. Surveys of quasars find that only Pr ~ 10% 
of quasars are radio sources with 3.6 cm fluxes above 1 mJy (e.g. Bischof & 
Becker |2"000|I . If all the quasar pairs were binary quasars and the probability 
of being radio loud is independent of whether a quasar is in a binary, then 
the relative number of O 2 , 2 R and 2 R 2 binaries should be 1 to 2Pr = 0.2 
to Pj| = 0.01. Given that we observed 4 2 R binaries we should observe 
20 O 2 binaries and 0.2 2 R 2 binaries. This statistical pattern matches the 
data, and Kochanek, Munoz & Falco <|1999fl found that the most probable 
solution was that all quasar pairs were binary quasars with an upper limit 
of only 8% (68% confidence) on the fraction that could be dark lenses. With 
the subsequent expansion of the quasar pair sample and the discovery of the 
first 2 R 2 binary (B0827+525, Koopmans et al. I2000j> . these limits could be 
improved. 

The second statistical argument is that the dark lens candidates do not 
have the statistical properties expected for lenses. Three aspects of the quasar 
pairs make them unlikely to be lenses simply given the properties of grav- 
itational lensing. First, there are no four- image dark lens candidates even 
though a third of the normal lenses are quads. Second, many of the dark lens 
candidates have very high flux ratios between the images - 4 of the 9 am- 
biguous quasar pairs considered by Rusin (J2002 ) have flux ratios of greater 
than 10:1. Magnification bias makes such large flux ratios very improbable for 

7 We detected the host galaxies of the Q2345-007 quasars in the CASTLES H-band 
image. Their morphology is probably inconsistent with the lens hypothesis, but 
we viewed the data as too marginal to publish the result 
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true gravitational lenses f ^lB.6.61 Kochanek 1995b). Third, the suppression of 
central/third/odd images in the lens population is a consequence of baryonic 
cooling and the resulting increase of the central surface density. Standard 
dark matter halos with their shallow central cusps, p oc r _1 , generally pro- 
duce detectable third images. Since it is probably a requirement for a lens to 
remain dark that the baryons in the halo cannot cool (or they would form 
stars), you would expect the typical dark lens to resemble APM08279+5255 
and have an easily detectable third image (Rusin 20IEJ). Thus, in the context 
of CDM we would expect dark lenses to be standard cuspy density distribu- 
tions like the NFW model (Ean. lR60|l . Rusin plPl)) evaluated the likelihood 
of the quasar pairs assuming that dark lenses have the structure of CDM 
halos and found that the observed flux ratios and the lack of three-image 
dark lenses were extremely unlikely. Only the real lens APM08279+5255 had 
a significant probability of being produced by a dark CDM halo, although for 
this case I think the exposed cusp/disk lens explanation for the morphology 
is more likely. 

The evidence overwhelmingly favors interpreting the quasar pairs as bi- 
nary quasars. However, as originally pointed out by Djorgovski (1991), the 
one problem with the binary hypothesis is that the incidence of the quasar 
pairs is two orders of magnitude above that expected from an extrapolation 
of the quasar-quasar correlation function on scales of Mpc. As discussed in 
Kochanek, Munoz & Falco 1(1999(1 and Mortlock, Webster & Francis 1119990 
the incidence can be increased if the incipient merger of the two host galax- 
ies is triggering the quasar activity. The separation distribution of the binary 
quasars is crudely compatible with tidally triggered activity when the merger 
starts followed by a coalescence of the host galaxies driven by tidal friction. 
Small separation binary quasars (AO < 3'.'0) are rare because the decay of the 
host galaxy orbits accelerates as their separation diminishes. Well-measured 
angular distributions of binary quasars, potentially obtainable from SDSS, 
might allow detailed explorations of the triggering and merging physics. 

B.8 The Role of Substructure 

Simulations of CDM halos predicted many more small satellites than were 
actually observed in the Milky Way (e.g. Kauffmann et al. 119931 Moore et 
al. HHHI Klypin et al.QM3l. Crudely 5-10% of the mass was left in satel- 
lites with perhaps 1-2% at the projected separations of l-2R e where we see 
most lensed images (e.g. Zentner & Bullock 20031 Mao et al. I2()()4|l . This is 
far larger than the observed fraction of 0.01-0.1% in observed satellites (e.g. 
Chiba 2002). Solutions were proposed in three broad classes: hide the satel- 
lites by preventing star formation so they are present but dark (e.g. Klypin 
et al. HM91 Bullock et al. EES, destroy them using self-interacting dark 
matter (e.g. Spergel & Steinhardt 12000(1 . or avoid forming them by changing 
the power spectrum to something similar to warm dark matter with signifi- 
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Fig. B.58. The most spectacular example of an anomalous flux ratio, 
SDSS0924+0219 (Inada et al. 2003). In this CASTLES infrared HST image, the D 
image should be comparable in brightness to the A image, but is actually an order 
of magnitude dimmer. The A and B images are minima, while C and D are saddle 
points. The contours are spaced by factors of two from the peak of the A image. 
The lens galaxy is seen at the center. At present we do not know whether the sup- 
pression of the saddle point in this lens is due to microlensing or substructure. If it 
is microlensing, ongoing monitoring programs should see it return to its expected 
flux within approximately 10 years. 

cantly less power on the relevant mass scales (e.g. Bode et al. 1200 ljl . These 
hypotheses left the major observational challenge of distinguishing dark satel- 
lites from non-existent ones. This became known as the CDM substructure 
problem. 

It was well known in the lensing community that the fluxes of lensed im- 
ages were usually poorly fit by lens models. There was a long litany of reasons 
for ignoring them arising from possible systematic errors which can corrupt 
image fluxes. Differential effects between the images from the interstellar 
medium of the lens can corrupt the fluxes (dust in the optical/IR, scatter 
broadening in the radio, see ilB.9.11) . Time delays combined with source vari- 
ability can corrupt any single-epoch measurement. Microlensing by the stars 
in the lens galaxy can modify the fluxes of any sufficiently compact com- 
ponent of the source (at a minimum the quasar accretion disk, see Part 4). 
The most peculiar problem was the of anomalous flux ratios in radio lenses. 
Radio sources are essentially unaffected by the ISM of the lens galaxy in 
low resolution observations that minimize the effects of scatter broadening 
(VLA rather than VLBI), true absorption appears to be rare, radio sources 
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generally show little variability even when monitored, and most of the flux 
should come from regions too large to be affected by microlensing. Yet in 
B1422+231, for example, the three cusp images violated the cusp relation for 
their fluxes (that the sum of the signed magnifications of the three images 
should be zero, see Metcalf & Zhao EDO! Keeton, Gaudi & Petters l2"0"0"3l or 
Schneider, Ehlers & Falco HM^ . 8 

ft is easier to outline the problem of anomalous flux ratios near a fold 
caustic (such as images A and D in SDSS0924+0219, see Fig. IE.58|I . than 
a cusp caustic. Near a fold, the lens equations can be reduced to a one- 
dimensional model with 

13 = 0(1- <P") - -&'"9 2 -> --V'"6 2 (B.122) 
2 2 

and inverse magnification 

/ir 1 = (1 - if") - W"'e -> -&'"6 (B.123) 

where we choose our coordinates such that there is a critical line at 8 — (i.e. 
1 — \P " = 0) and the primes denote derivatives of the potential. These equa- 
tions are easily solved to find that you have images at 6± = ±(— 2/3/!?''") 1 / 2 
if the argument of the square root is positive and no solutions otherwise - as 
you cross the fold caustic (j3 = 0) two images are created or destroyed on the 
critical line at 9 = 0. Their inverse magnifications of ^} — =f(— 2/3i''") 1 / 2 
are equal in magnitude but reversed in sign. Hence, if the assumptions of the 
Taylor expansion hold, the images merging at a fold should have identical 
fluxes. Either by guessing or by tedious algebra you can determine that the 
fractional correction to the magnification from the next order term is of or- 
der #±!^ 4 ' /<?"' '. For any reasonable central potential where the images are 
at radius 9o from the lens center, the fractional correction will be of order 
8±/9q ~ 0.1 for the typical pair of anomalous images. Hence, using gravity 
to produce the anomalous flux ratios requires terms in the potential with a 
length scale comparable to the separation of the images to significantly vio- 
late the rule that they should have similar fluxes. Mao & Schneider (1998) 
pointed out that a very simple way of achieving this was to put a satel- 
lite near the images, and they found that this could explain the anomaly in 
B1422+231. Metcalf & Madau PUUTI also see Bradac et al.HEHfor images of 
the magnification patterns expected from a CDM halo) put these two pieces 
together, pointing out that if normal satellite galaxies were too rare to make 
anomalous flux ratios common, the missing CDM substructure was not. They 
predicted that in CDM, anomalous flux ratios should be common. 

8 In specific models there can also be global invariants relating image positions 
and magnifications (e.g. Witt & Mao 2000 Hunter & Evans 120011 Evans & 
Hunter 2002). These results are usually for simple softened power law models 
using either ellipsoidal potentials or an external shear rather than ellipsoidal 
cuspy density distributions with an external shear, so their applicability to the 
observed lenses is unclear. 
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Fig. B.59. (Top) A Monte Carlo test for estimating substructure surface densities. 
The heavy curves show the estimated probability distribution for the substructure 
surface density fraction in a sample of 7 four-image lenses in which the input fraction 
was 5% (marked by the vertical line). The points on the curve show the median, 
la and 2a confidence limits. The output distributions are consistent with the true 
input fraction. The dashed line shows how the accuracy would improve given a 
sample of 56 lenses (i.e. multiplying the 8 trials of 7 images each). 

Fig. B.60. (Bottom) The same method applied to the real data. The three distri- 
butions show the effects of changing assumptions on the actual flux measurement 
errors - the greater the measurement uncertainties the less substructure surface 
density is required to explain the flux ratio anomalies. The middle case (10%) is 
probably slightly too conservative (20% is ridiculously conservative and 5% is prob- 
ably too optimistic). 
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If we add a population of satellites with surface density n sat = S sat /S 
near the images we can estimate the nature of the perturbations. If we model 
them as pseudo-Jaffc potentials with critical radius b and break radius 9 a = 
(66o) 1//2 , then the satellites produce a deflection perturbation of order 

(5er /2 ~ 10-% (^) V2 (^) 3/4 - (B.124) 

Only massive satellites will be able to produce deflection perturbations large 
enough to be detected given typical astrometric errors. Because the astro- 
metric constraints for lenses arc so accurate, generally better than 0'.'005, 
satellites with deflection scales larger than b J> 10 -2 6rj will usually have ob- 
servable effects on model fits and must be included in the basic lens model. 
The shear perturbation 

where \nA = ln(a/s) is a Coulomb logarithm required to make the integral 
converge at small separations, is significantly larger. The effects of substruc- 
ture gain on those from the primary lens as we move to quantities requiring 
more derivatives of the potential because the substructure has less mass but 
shorter length scales. For example most astronomical objects have masses 
and sizes that scale with internal velocity a v as M oc and R oc <r 2 . So time 
delays, which depend on the (two-dimensional) potential & oc M oc cr^, will 
be completely unaffected by substructure. Deflections, which require one spa- 
tial derivative of the potential, a oc \P/R oc er 2 ,, are affected only be the more 
massive substructres. Magnifications, which require two spatial derivatives of 
the potential, k ~ 7 ~ 9 '/ 'R 2 oc cr° , are affected equally by all mass scales 
provided the Einstein radius of the object is larger than the characteristic size 
of the source. Substructure will also affect brighter images more than fainter 
images because the magnifications of the brighter images are more unstable 
to small perturbations. Recall that the magnification /i = (A + A_) _1 where 
one of the eigenvalues A± = 1 — K ± 7, usually A_, is small for a highly 
magnified image. If we now add a shear perturbation £7, the perturbation 
to the magnification is of order 5"//\- so you have a bigger fractional per- 
turbation to the magnification for the same shear perturbation if the image 
is more highly magnified. The last important effect from substructure, for 
which I know of no simple, qualitative explanation, is that substructure dis- 
criminates between saddle points and minima when it is a small fraction of 
the total surface density (Schechter & Wambseanss 120021 Keeton 12003 bfl . In 
this regime, the magnification distributions for the saddle points develop an 
extended tail toward demagnification that is not present for the minima. 

9 This is the tidal truncation radius for an SIS of critical radius 6 orbiting in an 
SIS of critical radius b > b. The total satellite mass is ~ -KabS c . 
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Fig. B.61. Saddle point suppression in lenses. The three panels show the cumula- 
tive distributions of model flux residuals, log( fobs / } mod) , in the real data, assuming 
constant fractional flux errors for each image. The solid (dashed) lines are for min- 
ima (saddle points), with squares (no squares) for the distribution corresponding to 
the most (least) magnified image. From top to bottom the distributions are shown 
for samples of 8 radio, 10 optical or 15 total four-image lenses. If the flux residuals 
are created by propagation effects we would not expect the distributions to depend 
on the image parity or magnification, while if they are due to low optical depth 
substructure we would expect the distribution for the brightest saddle points to be 
shifted to lower observed fluxes. 
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It turns out that anomalous flux ratios are very common - a fact which 
had been staring us in the face but was ignored because most people (in- 
cluding the author!) were mainly just annoyed that the flux ratios could 
not be used to constrain the potential of the primary lens so as to deter- 
mine the radial mass profile. When Dalai & Kochanek l|2002|l collected the 
available four-image radio lenses to estimate the abundance of substructure, 
they found that 5 of 6 systems showed anomalies. In order to estimate the 
abundance of substructure Dalai & Kochanek H2()02(l developed a Bayesian 
Monte Carlo method which estimated the likelihood that adding substructure 
would significantly improve models of seven four-image lenses including the 
fact that the model for the primary lens would have to be adjusted each time 
any substructure was added. Under the assumption that the uncertainties in 
flux measurements (systematic as well as statistical) were 10%, they found 
a substructure mass fraction of 0.006 < f sat < 0.07 (90% confidence) with 
a median estimate of f sa t — 0.02. This is consistent with expectations from 
CDM simulations, including estimates of the destruction of the satellites in 
the inner regions of galaxies (Zentner & Bullock 2003, Mao et al. I2004f) . and 
too high to be explained by normal satellite populations. Because the result 
is driven by the flux anomalies, which do not depend on the mass of the 
substructures, rather than astrometric anomalies, which do depend on the 
mass, the results had almost no ability to estimate the mass scale associated 
with the substructure. 

While substructure with approximately the surface density expected from 
CDM is consistent with the data, it is worth examining other possibilities. 
We would expect any effect from the ISM to be strongly frequency dependent 
(whether in the radio or in the optical). At least for radio lenses, Kochanek 
& Dalai ( 2004 ) found that the optical depth function needed to explain the 
radio flux anomalies would have to be gray, ruling out all the standard radio 
suspects. We would also expect propagation effects at radio frequencies to 
preferentially affect the faintest images because they have the smallest an- 
gular sizes - remember that more magnified images are always bigger even 
if you cannot resolve the change in size. The ISM also cannot discriminate 
between images based on parity - the ISM is a local property of the lens 
and the parity is not, so they cannot show a correlation. Hence, if radio 
propagation effects created the anomalies they should be the same for min- 
ima and saddle points and more important for the fainter than the brighter 
images. Fig. IB.Bll shows the cumulative distributions of flux residuals for ra- 
dio, optical and combined four-image lens samples from Kochanek & Dalai 
(2004). The bright saddle point images clearly have a different distribution 
in each case, as we would expect for substructure but not for the ISM. The 
Kolmogorov-Smirnov test significance of the differences between the most 
magnified saddle points and the other three types of images (brightest mini- 
mum, faintest minimum, faintest saddle) is 0.04%, 5% and 0.3% for the radio, 
optical and joint samples respectively. The next most discrepant image is the 
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brightest minimum, also as expected for substructure, but with less signifi- 
cance. Various statistical games (bootstrap resampling methods of estimating 
significance or testing for anomalies) always give the same results. Thus, the 
ISM is ruled out as an explanation. 

Even though simple Taylor series arguments make it unlikely that changes 
to the central potential are a solution (see flB.4.4|) . it still has its advocates 
(Evans & Witt lMil Quadri et al. l2UB5l Moller, Hewitt & Blain 2003, Kawano 
et al. l2TTOjl . The basic answer is that it is possible to create flux anomalies 
by making the deviations of the central potential from ellipsoidal sufficiently 
large for the angular structure of the potential to change rapidly enough be- 
tween nearby images to produce the necessary magnification changes. There 
are three basic problems with this solution (see ^B.4.61 as well) . 

The first problem is that the required deviations from an ellipsoidal pro- 
file far too large. This is true even though the biggest survey of such models 
allowed image positions to shift by approximately 10 times their actual un- 
certainties in order to alter the image fluxes (Evans & Witt 2O03J _ had they 
forced the models to match the true astrometric uncertainties they would 
have needed even larger perturbations. Kochanek & Dalai (2004) found that 
models fitting the flux anomalies required 1 04 1 ^ 0.01 compared to the typical 
values observed for galaxies and simulated halos 1 04 1 ~ 0.01 (see 3B.4.4|I . It is 
fair to say, however, that the quantitative results on the multipole structure 
of simulated halos are limited. 

The second problem is that when we test these solutions in lenses for 
which we have additional model constraints, the models are forced back to- 
ward the standard ellipsoidal models. The basic problem, as Evans & Witt 
(2003) show, is that the problem of fitting image positions and fluxes with 
potentials of the form rF(9) can be reduced a a problem in linear algebra if 
F(9) is expanded as a multipole series - by adding enough terms it is pos- 
sible to fit any four-image lens exactly. The reasons go back to the lack of 
constraints we discussed in flB.4.61 Fig IB.271 illustrates this point using the 
lens B1933+503. Kochanek & Dalai ( 2004) first fit the four compact images 
with a model including deviations from an ellipsoidal surface density. With 
sufficiently strong deviations there were models that could eliminate the flux 
anomalies in this system. However, this lens, B1933+503, actually has three 
components to its source - a compact core forming the four-image system 
with the anomaly but also to radio lobes lensed into another four-image sys- 
tem and a two-image system for 10 images in all (Fig. IB. 71) . When we add 
the constraints from these other images the model is forced back to being a 
standard ellipsoidal model with a flux ratio anomaly. In the future, the degree 
to which lens galaxy potentials are ellipsoidal could be thoroughly tested in 
the lenses with Einstein ring images of their host galaxies. 

The third problem with using the central potential to produce flux ratio 
anomalies is that it does not lead to the discrimination between saddle points 
and minima shown in Fig. IB. 611 Kochanek & Dalai ( 2004 ) demonstrate this 
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Fig. B. 62. The improvement in the fit to the Ros et al. I2UUUI VLBI data on 
MG0414+0534 from adding an additional lens with a Einstein radius 15% that of 
the primary lens galaxy as a function of its position. The squares show the location 
of the quasar images, the central circles mark the position of the main lens galaxy 
and the single circle marks the position of object X (see Fig. IB.6II . The heavy 
contour has the same \ 2 = 123 as single component models, and they then drop a 
factor of 0.2 per lighter contour to a minimum of x 2 ~ 0-6 almost exactly at the 
position of Object X. 

with Monte Carlo simulations, but the basic reason is simple. Consider a 
lens like PG1115+080 with two images merging at a saddle point. The sense 
with which the saddle point and minima are perturbed depends on the phase 
of the higher order multipoles relative to the images and the critical line, 
but for any fixed lens potential, that phase varies depending on the source 
position, so the average effect cannot make the bright saddle points show 
a significantly different set of properties from the bright minima. Every ob- 
served flux anomaly could be explained by adding complex angular structures 
to the main lens, but the inability of these models to differentiate between 
saddle points and minima would still rule them out. 

For the moment there are two barriers to improving estimates of the sub- 
structure mass fraction. First, radio lens surveys have run out of sources 
bright enough to conduct efficient surveys. This will only change as upgrades 
to existing radio arrays are completed. The proposed Merlin and VLA up- 
grades will provide both sensitivity and resolution improvements that will 
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Fig. B.63. VLBI maps of MG2016+112 (Koopmans et al.EEEJ. The large differ- 
ence in the C11/C12 separation as compared to the C13/C2 separation is the clearest 
example of an "astrometric" anomaly in a lens. The critical line passes between C12 
and C13 and by symmetry we would expect the separations of the subcomponents 
on either side of the critical line to be similar. In this case the cause of the asymme- 
try seems to be a galaxy D about 0'.'8 South of the C image (see Fig. IB.6l . Galaxy 
D has the same redshift as the primary lens (Koopmans & Treu l^fM)l . 
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make the next generation of radio lens surveys easier than the last. Second, 
searches for substructure using optical quasars need to separate the effects 
of microlensing and substructure. With simple imaging this can be done by 
finding parts of the quasar which are sufficiently extended to avoid signifi- 
cant contamination from microlensing. Emission line (e.g. Moustakas & Met- 
calf l2003)) and dust emission regions should both be large enough to filter out 
the effects of the stars. Studying emission line ratios is now relatively easy 
because of the new generation of small-pixel integral field spectrographs on 
8m-class telescopes. Mid-infrared flux ratios for the dusty regions remain dif- 
ficult, but the have been obtained for one lens (Q2237+0305, Agol et al. 12711171)1 
and could be extended to several more. 

The gold standard, however, would be astrometric detection of dark sub- 
structure so that we would obtain a direct, mass estimate. In all the present 
analyses, the most massive substructures were included as part of the model. 
They were not, however, dark substructures because they matched to satel- 
lites visible in HST images of the lenses. For example, Object X in MG0414+0534 
(Fig. IB.6|) has effects on the image positions that are virtually impossi- 
ble to reproduce with changes in the potential of the central lens galaxy 
(Trotter, Winn & Hewitt I2000J1 . while models with it easily fit the data 
(Ros et al. I2()()()|l . Fig. IB.621 shows the dependence of the goodness of fit to 
MG0414+0534 on the location of an additional lens component, with a deep 
minimum located at the observed position of Object X. The deflections pro- 
duced by an object of mass M generally scale as M 1 / 2 , so it is relatively easy 
to detect the deflection perturbations from objects only 1% the mass of the 
primary lens. One approach is to search lenses with VLBI structures for signs 
of perturbations. This has been attempted for B1152+199 by Metcalf ((5002 ), 
but the case for substructure is not very solid given the limited nature of the 
data. The cleanest example of astrometric detection of something small, but 
sadly not dark, is in the VLBI structure of image C in MG2016+112 (Koop- 
mans et al. 2002). The asymmetry in the VLBI component separations of 
image C on either side of the critical line (see Fig. IB. 63) is due to a very faint 
galaxy 0'.'8 South of the image with a deflection scale ~ 10% of the primary 
lens (see Fig.|R6j. This is in reasonable agreement with the prediction from 
the H-band magnitude difference of 4.6 mag and the (lens) Faber- Jackson 
relation between magnitudes and deflections. In this case, we even know that 
the satellite is at the same redshift as the lens because Koopmans & Treu 
( 2002) accidentally measured its redshift in the course of their observations 
to measure the velocity dispersion of the lens galaxy. 

B.8.1 Low Mass Dark Halos 

When we are examining a particular lens, almost all the substructure will con- 
sist of satellites associated with the lens, with only a ~ 10% contamination 
from other small halos along the line-of-sight to the source (Chen, Kravtsov & 
Keeton 2003). However, the excess of low mass halos in CDM mass functions 
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relative to visible galaxies is a much more general problem because the low 
mass CDM satellites should exist everywhere, not just as satellites of massive 
galaxies. Crudely, luminosity functions diverge as dn/dL ~ 1/L ~ 1/M while 
CDM mass functions diverge as dn/dM ~ M -18 so the fraction of low mass 
halos that must be dark increases ~ M -0 8 at low masses. Fie. IB . 5 fl illustrates 
this assuming that all low mass halos have baryons which have cooled (e.g. 
Gonzalez et al. 120001 Kochanek 20023). In the context of CDM, the solution 
to this general problem is presumably the same as for the satellites respon- 
sible for anomalous flux ratio - they exist but lost their baryons before they 
could form stars. Such processes are implicit in semianalytic models which 
can reproduce galaxy luminosity function (e.g. Benson et al. 2003) but can 
be modeled empirically in much the same way was employed for the break 
between galaxies in clusters in ijB.7l fc.g. Kochanek 2003c). In any model, the 
probability of the baryons cooling to form a galaxy has to drop rapidly for 
halo masses below ~ 10 11 M Q just as it has to drop rapidly for halo masses 
above ~ 10 13 Af Q . Unlike groups and clusters, where we still expect to be able 
to detect the halos from either their member galaxies or X-ray emission from 
the hot baryons trapped in the halo, these low mass halos almost certainly 
cannot be detected in emission. 

We can only detect isolated, low-mass dark halos if they multiply image 
background sources. For SIS lenses the distribution of image separations for 
small separations (A9/A6* -C 1, Ean. IB.1L2"|) scales as 

oc A9 1+ ^^ 1+a ^ 2 (B.126) 

dAO 

where a describes the divergence of the mass/luminosity function at low mass 
and 7fj is the conversion from mass to velocity dispersion (see ^IB.6.21) . For 
the standard parameters of galaxies, a ~ — 1 and jfj ~ 4, the separation dis- 
tribution is drsis / dA6 oc Ad. In practice we do not observe this distribution 
because the surveys have angular selection effects that prevent the detection 
of small image separations (below 0'.'25 for the radio surveys), so the observed 
distributions show a much sharper cutoff (Fig. IB.lJl . Even without a cutoff, 
there would be few lenses to find - the CLASS survey found 9 lenses between 
0V3 < A9 < l'.'O in which case we expect only one lens with AO < 0'.'3 even 
in the absence of any angular selection effects. A VLBI survey of 3% of the 
CLASS sources with milli-arcsecond resolution found no lenses (Wilkinson et 
al.H301), nor would it be expected to for normal galaxy populations. Our 
non-parametric reconstruction of the velocity function including selection ef- 
fects confirms that the existing lens samples are consistent with this standard 
model (Fig. EH- 

The result is very different if we extrapolate to low mass with the a ~ —1.8 
slope of the CDM halo mass function. The separation distribution becomes 
integrably divergent, drsis/ dA9 oc A9~ 06 , and we would expect 15 lenses 
with AO < 0'.'3 given 9 between 0'.'3 < A9 < l'.'O. Unfortunately, the Wilkin- 
son et al. ( 2001) VLBI survey is too small to rule out such a model. A larger 
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VLBI survey could easily do so, allowing the lenses to confirm the galaxy 
counting argument for the existence of second break in the density structure 
of halos at low mass (Kochanek 20 ()3cl Ma|2)03) similar to the one between 
galaxies and high mass halos ( flB.7)) . If the baryons in the low mass halos ei- 
ther fail to cool, or cool and are then ejected by feedback, then their density 
distributions should revert to those of their CDM halos. If they are stan- 
dard NFW halos, Ma |2003J shows that such low mass dark lenses will be 
very difficult to detect even in far larger surveys than are presently possible. 
Nonetheless, improving the scale of searches for very small separations from 
the initial attempt by Wilkinson et al. 1)2001(1 would provide valuable limits 
on their existence. 

The resulting small, dark lenses would be the same as the dark lenses we 
discussed in ^B.7.2l for binary quasars and explored by Rusin ( 2002 ). They will 
also create the same problems about proving or disproving the lens hypothesis 
as was raised by the binary quasars with the added difficulty that they will 
be far more difficult to resolve. Time delays, while short enough to be easily 
measured, will also be on time scales where quasars show little variability. 
Confirmation of any small dark lens will probably requires systems with three 
or four images, rather than two images, and the presence of resolvable (VLBI) 
structures. 

B.9 The Optical Properties of Lens Galaxies 

The optical properties of lens galaxies and the properties of their interstellar 
medium (ISM) are important for two reasons. First, statistical calculations 
such as those in 3B.6l relv on lens galaxies obeying the same scaling relations 
as nearby galaxies and the selection effects depend on the properties of the 
ISM. Thus, measuring the scaling relations of the observed lenses and the 
properties of their ISM are an important part of validating these calculations. 
Second, lenses have a unique advantage for studying the evolution of galaxies 
because they are the only sample of galaxies selected based on mass rather 
than luminosity, surface brightness or color. Evolution studies using optically- 
selected samples will always be subject to strong biases arising from the 
difficulty of matching nearby galaxies to distant galaxies. Selection by mass 
rather than light makes the lens samples almost immune to these biases. 

Most lens galaxies are early-type galaxies with relatively red colors and 
few signs of significant on-going star formation (like the 3727A or 5007A 
Oxygen lines). The resulting need to measure absorption line redshifts is one 
of the reasons that the completeness of the lens redshift measurements is so 
poor. Locally, early-type galaxies follow a scries of correlations which also 
exist for the lens galaxies and have been explored by Im, Griffiths & Rat- 
natunga (|1H97[> . Keeton, Kochanek & Falco l|1998|) . Kochanek et al. ((2000 ) , 
Rusin et al. (2003), Rusin, Kochanek & Keeton ((2003 ), van de Ven, van 
Dokkum & Franx (2003J), Rusin & Kochanek PTMjl . 



148 C.S. Kochanek 



The first, crude correlation is the Faber- Jackson relation between velocity 
dispersion and luminosity used in most lens statistical calculations. A typical 
local relation is that from TO. 6.21 and shown in Fig. IB. 411 Most lenses lack 
directly measured velocity dispersions, but all lenses have a well-determined 
image separation AO. For specific mass models the image separation can 
be converted into an estimate of a velocity dispersion, such as the AO = 
8Tr(a v /c) 2 Dd s /D s relation of the SIS, but the precise relationship depends 
on the mass distribution, the orbital isotropy, the ellipticity and so forth (see 
3B.4.9JI . For the lenses, there is a close relationship between the Faber- Jackson 
relation and aperture mass-to-light ratios. The image separation, AO, defines 
the aperture mass interior to the Einstein ring, 

M ap = ^S c A0 2 (B.127) 

where S c — c 2 D s /AiTGD ds D d is the critical surface density. By image sepa- 
ration we usually mean either twice the mean distance of the images from the 
lens galaxy or twice the critical radius of a simple lens model rather than a 
directly measured image separation because these quantities will be less sen- 
sitive to the effects of shear and ellipticity. If we measure the luminosity in the 
aperture L ap using (usually) HST, then we know the aperture mass-to-light 
(M/L) ratio T ap = M ap /L ap . 

If the mass-to- light ratio varies with radius or with mass, then to compare 
values of T ap from different lenses we must correct them to a common radius 
and common mass. If these scalings can be treated as power laws, then we 
can define a corrected aperture mass to light ratio X* = T ap (D2 n9 AO / 2R ) X 
where i? is a fiducial radius and x is an unknown exponent, and we would 
expect to find a correlation of the form 

log?; = 2(1 + a) log AO + 0AM abs + constant (B.128) 

where M abs is the absolute magnitude of the lens (in some band) and a value 
d^O indicates that the mass-to-light ratio varies either with mass or with 
radius. We can then rewrite this in a more familiar form as 

Mabs = M absfi +j EV Zi - 1.25 7FJ log (4j}A (B.129) 

where AQq sets an arbitrary separation scale, jev ( or a more complicated 
function) determines the evolution of the luminosity with redshift, and 7^,7 = 
4(1 + a) sets the scaling of luminosity with normalized separation defined so 
that for an SIS lens (where AO oc a 2 ) the exponent ^pj will match the index 
of the Faber- Jackson relation (Eqn. IB.102|) . Fig. IB. 641 shows the resulting 
relation converted to the rest frame B band at redshift zero. The relation is 
slightly tighter than local estimates of the Faber- Jackson relation, but the 
scatter is still twice that expected from the measurement errors. The best fit 
exponent jpj = 3.29 ±0.58 ('Fig. IB.65|> is consistent with local estimates and 
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Fig. B.64. (Top) The "Fab er- Jackson" relation for gravitational lenses. The fig- 
ure compares the observed absolute B magnitude corrected for evolution to that 
predicted from the equivalent of the Faber- Jackson relation for gravitational lenses 
(Eqn. IfT. 12911 . The different point styles indicate whether the lens and source red- 
shifts were directly measured or estimated. From Rusin et al. (2003). 

Fig. B.65. (Bottom) The redshift zero absolute B-band magnitude and effective 
exponent of the "Faber- Jackson" relation L oc A9 lFJ ^ 2 for gravitational lenses. 
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implies a scaling exponent a = —0.18 ± 0.14 that is marginally non-zero. If 
the mass-to-light ratio of early- type galaxies increases with mass as T oc M x , 
then x = —a = 0.18 ±0.14 is consistent with estimates from the fundamental 
plane that more massive early-type galaxies have higher mass-to-light ratios. 
The solutions also require evolution with ^ev = —0.41 ± 0.21, so that early- 
type galaxies were brighter in the past. These scalings can also be done in 
terms of observed magnitudes rather than rest frame magnitudes to provide 
simple estimation formulas for the apparent magnitudes of lens galaxies in 
various bands as a function of redshift and separation to an rms accuracy of 
approximately 0.5 mag (see Rusin et al. l2003J> . 

The significant scatter of the Faber- Jackson relation makes it a crude tool. 
Early-type galaxies also follow a far tighter correlation known as the funda- 
mental plane (FP, Dressier et al. 119871 Djorgovski & Davis H^STjl between the 
central, stellar velocity dispersion <t c , the effective radius R e and the mean 
surface brightness inside the effective radius (SB e ) of the form 

lQ g f J^t-) = a lo S (iT^W) + & ( (SBe) - a ) + 7 (B.130) 
\h L kpcJ Vkms / \mag arcsec Z J 

where the slope a and the zero-point 7 depend on wavelength but the slope 
(3 ~ 0.32 does not (e.g. Scodeggio et al. 119981 Pahre, de Carvalho & Djor- 
govski E^JSI). Local estimates for the rest frame B-band give a = 1.25 and 
70 = -8.895 - log(/i/0.5) (e.g. Bender et al. H33S|l . In principle both the 
zero points and the slopes may evolve with redshift, but all existing studies 
have assumed fixed slopes and studied only the evolution of the zero point 
with redshift. For galaxies with velocity dispersion measurements, the ba- 
sis of the method is that measurement of R e and a v provides an estimate 
of the surface brightness the galaxy will have at redshift zero. The differ- 
ence between the measured surface brightness at the observed redshift and 
the surface brightness predicted for z — measures the evolution of the 
stellar populations between the two epochs as a shift in the zero-point Aj. 
The change in the zero-point is related to the change in the luminosity by 
AL = — 0.4Z\SB e = Ay/(2.5(3). While these estimates are always referred 
to as a change in the mass-to-light ratio, no real mass measurement enters 
operationally. If, however, we assume a non-evolving virial mass estimate 
M = CMa^Re/G for some constant cm, then the FP can be rewritten in 
terms of a mass-to-light ratio, 

so that if both a and f3 do not evolve, the evolution of the mass-to-light 
ratio is dlogT/dz = — (d^ /dz)/ (2.5(3). Either way of thinking about the FP, 
either as an empirical estimator of the redshift zero surface brightness or an 
implicit estimate of the virial mass, leads to the same evolution estimates but 
alternate ways of thinking about potential systematic errors. 
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Fig. B.66. (Top) Constraints on the B-band luminosity evolution rate 
d\og(M I L) b I dz as a function of the logarithmic density slope n (p oc r~ n ) of 
the galaxy mass distribution. Solid (dashed) contours are the 68% and 95% con- 
fidence limits on two parameter (one parameter). These use the self-similar mass 
models of Eqn. lB.89l and are closely related to the fundamental plane. From Rusin 
& Kochanek (1)04). 

Fig. B.67. (Bottom) Constraints on the mean star formation epoch (2/) as a func- 
tion of the logarithmic density slope n (p oc r~ n ) of the galaxy mass distribution. 
Solid (dashed) contours are the 68% and 95% confidence limits on two parameter 
(one parameter). The horizontal dotted lines mark (zf) = 1.3, 1.4, 1.5, 1.6 and 1.7. 
The lens sample favors older stellar populations with (zf) > 1.5 at 95% confidence. 
These use the self-similar mass models of Eqn. IB.89l and are closely related to the 
fundamental plane. From Rusin & Kochanek (2004). 
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Confusion about applications of lenses to the FP and galaxy evolution 
usually arise because most gravitational lenses lack direct measurements of 
the central velocity dispersion. Before addressing this problem, it is worth 
considering what is done for distant galaxies with direct measurements. The 
central dispersion appearing in the FP has a specific definition - usually either 
the velocity dispersion inside the equivalent of a 3'.'0 aperture in the Coma 
cluster or the dispersion inside R e /8. Measurements for particular galaxies 
almost never exactly match these definitions, so empirical corrections are ap- 
plied to adjust the velocity measurements in the observed aperture to the 
standard aperture. As we explore more distant galaxies, resolution problems 
mean that the measurement apertures become steadily larger than the stan- 
dard apertures. The corrections are made with a single, average local relation 
for all galaxies - implicit in this assumption is that the dynamical structure 
of the galaxies is homogeneous and non-evolving. This seems reasonable since 
the minimal scatter around the FP seems to require homogeneity, but says 
nothing about evolution. These are also the same assumptions used in the 
lensing analyses. 

If early-type galaxies are homogeneous and have mass distributions that 
are homologous with the luminosity distributions, then there is no difference 
between the lens FP and the normal kinematic FP, independent of the actual 
mass distribution of the galaxies (Rusin & Kochanek 2004 ) . If the mass dis- 
tributions are homologous, then the mass and velocity dispersion are related 
by M — CM^cRe/G where Cm is a constant, a c is the central velocity disper- 
sion (measured in a self-similar aperture like the R e /8 aperture used in many 
local FP studies), and R e is the effective radius. If we allow the mass-to-light 
ratio to scale with luminosity as Y oc L x , then the normal FP can be written 
as 

log ^_^ log , + ^ (ss , + g- (B . 132) 

which looks like the local FP (Eqn. IB.130|) if a = 2/ (2a; + 1) and = 
0.4(a; + i)/(2x + 1) (see Faber et al. 11987)1 . Thus, the lens galaxy FP will be 
indistinguishable from the FP provided early-type galaxies are homologous 
and the slopes can be reproduced by a scaling of the mass-to-light ratio (as 
they can for x ~ 0.3 given a ~ 1.2 and ~ 0.3, e.g., Jorgensen, Franx 
& Kiaergaard 119961 or Bender et al. I1998f) . All the details about the mass 
distribution, orbital isotropies and the radius interior to which the velocity 
dispersion is measured enter only through the constant cm or cquivalcntly 
from differences between the FP zero point 7 measured locally and with 
gravitational lenses. In practice, Rusin & Kochanek ( 2004) show that the zero 
point must be measured to an accuracy significantly better than Ay = 0.1 
before there is any sensitivity to the actual mass distribution of the lenses 
from the FP. Thus, there is no difference between the aperture mass estimates 
for the FP and its evolution and the normal stellar dynamical approach unless 
the major assumption underlying both approaches is violated. It also means, 
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perhaps surprisingly, that measuring central velocity dispersions adds almost 
no new information once these conditions are satisfied. 

Rusin & Kochanek II2004|) used the self-similar models we described in 
3B.4.8I to estimate the evolution rate and the star formation epoch of the 
lens galaxies while simultaneously estimating the mass distribution. Thus, 
the models for the mass include the uncertainties in the evolution and the 
reverse. Fig. IB.66I shows the estimated evolution rate, and Fig. IB. 671 shows 
how this is related to a limit on the average star formation epoch {zf) based 
on Bruzual & Chariot ( 1993, BC96 version) population synthesis models. This 
estimate is consistent with the earlier estimates by Kochanek et al. (2000) 
and Rusin et al. (2003) which used only isothermal lens models, as we would 
expect. Van de Ven, van Dokkum & Franx H2003[) found a somewhat lower 
star formation epoch ((zf) — 1.8^o 5) when analyzing the same data, which 
can be traced to differences in the analysis. First, by weighting the galaxies by 
their measurement errors when the scatter is dominated by systematics and 
by dropping two higher redshift lens galaxies with unknown source redshifts, 
van de Ven et al. (2003) analysis reduces the weight of the higher redshift 
lens galaxies, which softens the limits on low (zf). Second, they used a power 
law approximation to the stellar evolution tracks which underestimates the 
evolution rate as you approach the star formation epoch, thereby allowing 
lower star formation epochs. These two effects leverage a small difference in 
the evolution rate 10 into a much more dramatic difference in the estimated 
star formation epoch. These evolution rates are consistent with estimates for 
cluster or field ellipticals by (e.g. van Dokkum et al. 1996, 2001, van Dokkum 
& Franx EHIl van Dokkum & Ellis EEl Kelson et al. HMZI EHSU, and 
inconsistent with the much faster evolution rates found by Treu et al. (2001 
130172)1 or Gebhardt et al. lj2UU5}l . 

B.9.1 The Interstellar Medium of Lens Galaxies 

As well as studying the emission by the lens galaxy we can study its ab- 
sorption of emission from the quasar as a probe of the interstellar medium 
(ISM) of the lens galaxies. The most extensively studied effect of the ISM is 
dust extinction because of its effects on estimating the cosmological model 
from optically-selected lenses and because it allows unique measurements of 
extinction curves outside the local Group. There are also broad band effects 
on the radio continuum due to free-free absorption, scatter broadening and 
Faraday rotation. While all three effects have been observed, they have been 
of little practical importance so far. Finally, in both the radio and the optical, 
the lens can introduce narrow absorption features. While these are observed 

10 Rusin & Kochanek J3o"0"4l obtained d\og{M / L) B / dz = -0.50±0.19 including the 
uncertainties in the mass distribution, Rusin et al. <!2003t obtained —0.54 ± 0.09 
for a fixed SIS model, and van de Ven et al. (.20031 obtained —0.62 ± 0.13 for a 
fixed SIS model. 
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Fig. B.68. Histograms of the differential extinction in various lens subsamples 
from Falco et al. 119991 . In each panel the solid histogram shows the full sample of 
37 differential extinctions measured in 23 lenses while the shaded histogram shows 
the distributions for different selection methods (radio/optical) or galaxy types 
(early/late). The hatched region shows the extinction range consistent with the 
Falco, Kochanek & Munoz 0998 ) analysis of the difference between the statistics of 
radio-selected and optically-selected lens samples (see i|B,6.6t . Note that the most 
highly extincted systems, PKS1830-211 and B0218+357, are both radio-selected 
and late-type galaxies. The lowest differential extinction bins are contaminated by 
the effects of finite measurement errors. 



B Strong Gravitational Lensing 



155 



in some lenses, observational limitations have prevented them from being as 
useful as the are in other areas of astrophysics. 

As we mentioned in ilB.61 extinction is an important systematic prob- 
lem for estimating the cosmological model using the statistics of optically 
selected lenses. It modifies the results by changing the effective magnification 
bias of the sample because it can make lensed quasars dimmer than their 
unlensed counterparts. Because we see multiple images of the same quasar, 
it is relatively easy to estimate the differential extinction between lensed im- 
ages under the assumption that the quasar spectral shapes are not varying 
on the time scale corresponding to the time delay between the images and 
that microlcnsing effects are not significantly changing the slope of the quasar 
continuum. The former is almost certainly valid, while for the latter we sim- 
ply lack the necessary data to check the assumption (although we have a 
warning sign from the systems where the continuum and emission line flux 
ratios differ, see Part 4). Under these assumptions, the magnitude difference 
at wavelength A between two images A and B 



depends on the ratio of the image magnifications fiA/fJ-B) the differential ex- 
tinction AE(B — V) = Ea — Eb between the two images and the extinction 
law R(X/ (I + zi)) of the dust in the rest frame of the dust. We have the addi- 
tional assumption that either the extinction law is the same for both images or 
that one image dominates the total extinction (Nadeau et al. II99Ijl . Because 
it is a purely differential measurement that does not depend on knowing the 
intrinsic spectrum of the quasar, it provides a means of determining extinc- 
tions and extinction laws that is otherwise only achievable locally where we 
can obtain spectra of individual stars (the pair method, e.g. Cardelli, Clayton 
& Mathis H989[) . The total extinction cannot be determined to any compa- 
rable accuracy because estimates of the total extinction require an estimate 
of the intrinsic spectrum of the quasar. Fig. IB. 681 shows the distribution of 
differential extinctions found in the Falco et al. (|1999() survey of extinction in 
23 gravitational lenses. Only 7 of the 23 systems had colors consistent with 
no extinction, and after correcting for measurement errors and excluding the 
two outlying, heavily extincted systems the data are consistent with a one- 
sided Gaussian distribution of extinctions starting at and with a dispersion 
of oae ^ O.f mag. The two outlying systems, B02I8+357 and PKS1830- 
211, were both radio-selected and both have one image that lies behind a 
molecular cloud of a late type lens galaxy (see below). 

For lenses that have the right amount of dust, so that the image flux 
ratio can be measured accurately over a broad range of wavelengths, it is 
possible to estimate the extinction curve R(X/(1 + z{)) of the dust (Nadeau 
et al. I1991fl or to estimate the dust redshift under the assumption that the 
extinction curve is similar to those measured locally (Jean & Surdej 1998). 
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Fig. B.69. The extinction curve of the dust in SBS0909+532 at zi = 0.83 by 
Motta et al. |20 2f . The solid squares show the magnitude difference as a function 
of inverse rest wavelength derived from integral field spectra of the continuum of 
the quasars. The open squares are broad band measurements from earlier HST 
imaging and the filled triangles are the flux ratios in the quasar emission lines. The 
solid curve shows the best fit Rv = 2.1 ± 0.9 Cardelli, Clayton & Mathis (.19891 
extinction curve while the dashed curve shows a standard Rv = 3.1 curve. The 
offset between the continuum and emission line flux ratios seems not to depend on 
wavelength and is probably due to microlensing. 

Starting with Nadeau et al. (1991) , there have been many estimates of extinc- 
tion curves in lens galaxies (Falco et al. 119991 Toft, Hjorth & Burud|2000 
Motta et al. 120021 Muhoz et al. I2UU4[) . The most interesting of these are 
for systems where the region near the 2175A extinction feature is visible. 
This requires source and lens redshifts that put the feature at long enough 
wavelengths to be easily observed (i.e. higher lens redshifts) with a quasar 
UV continuum extending to shorter wavelengths (i.e. lower source redshifts). 
Motta et al. (|2002|l achieved the first cosmological detection of the feature 
in the z t = 0.83 lens SBS0909+532, as shown in Fig. lB~69l The overall ex- 
tinction curve is marginally consistent with a standard Galactic Ry = 3.1 
extinction curve. Other cosmologically distant extinction curves are very dif- 
ferent from normal Galactic models ranging for an anomalously low Ry curve 
in MG0414+0534 at z t = 0.96 (Falco et al-EH, probably an SMC extinc- 
tion curve in LBQS1009-252 at an estimated redshift of zi ~ 0.88 (Munoz 
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et al. I2004[l . and a anomalously high Ry extinction curve for the dust in 
the molecular cloud of the z\ — 0.68 lens galaxy in B0218+357. The Jean & 
Surdej (1998 ) idea of using the shape of the extinction curve to estimate the 
redshift of the dust also seems to work given a reasonable amount of dust and 
wavelength coverage (see Falco et al. 119991 Muhoz et al. I2004[l , but too few 
lenses with unknown redshifts satisfy the requirements for widespread use of 
the method. 

For broad band radio emission from the source, the three observed prop- 
agation effects are free-free absorption, scatter broadening and Faraday ro- 
tation. For example, in PMNJ1632-0033, the candidate third image of the 
lens (C) has the same radio spectrum as the other two images except at the 
lowest frequency observed (1.4 GHz) where it is fainter than expected. This 
can be interpreted as free-free absorption by electrons at the center of the 
lens galaxy but the interpretation needs to be confirmed by measurements 
at additional frequencies to demonstrate that the dependence of the optical 
depth on wavelength is consistent with the free-free process (Winn, Rusin & 
Kochanek 2004) . Scatter broadening is observed in many radio lenses (e.g. 
PMN0134-0931, Winn et al. MM B0128+437, Biggs e t al HUM PKS1830- 
211, Jones et al-EMU B1933+503, Marlow et al. 11555) primarily as changes 
in the fluxes of images between high resolution VLBI observations and lower 
resolution VLA observations or apparently finite sizes for compact source 
components in VLBI observations. In the presence of a magnetic field, the 
scattering medium will also rotate polarization vectors (e.g. MG1131+0456, 
Chen & Hewitt 1993). This is only of practical importance if maps which 
depend on the polarization vector are used to constrain the lens potential. 
In short, these effects are observed but have so far been of little practical 
consequence. 

More surprisingly, absorption by atoms and molecules has also been of 
little practical import for lens physics as yet. Wiklind & Alloin (2002) provide 
an extensive review of molecular absorption and emission in gravitational 
lenses. The two systems with the strongest absorption systems are B0218+357 
and PKS1830-211 (see Gerin et al. 119971 and references therein) where one 
of the two images lies behind a molecular cloud of the spiral galaxy lens. 
These two systems also show the highest extinction of any lensed images 
(Falco et al. 1999). Molecular absorption systems can be used to determine 
time delays (Wiklind & Alloin 12002(1 , measure the redshift of lens galaxies 
(the lens redshift in PKS1830-211 is measured using molecular absorption 
lines, Wiklind & Combes 1096), and potentially to determine the rotation 
velocity of the lens galaxy (e.g. Koopmans & de Bruyn 2003). These studies at 
centimeter and millimeter wavelengths are heavily limited by the resolution 
and sensitivity of existing instruments, and the importance of these radio 
absorption features will probably rise dramatically with the completion of 
the next generation of telescopes (e.g. ALMA, LOFAR, SKA). 
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Similar problems face studies of metal absorption lines in the optical. 
Since most lenses are at modest redshifts, the strongest absorption lines ex- 
pected from the lens galaxies tend to be observable only from space because 
they lie at shorter wavelengths than the atmospheric cutoff. For most lenses 
only the Mgll (2800A) lines are observable from the ground since you only 
require a lens redshift z\ <; 0.26 to get the redshifted absorption lines long- 
wards of 3500 A. The other standard metal line, CIV (1549A), is only visible 
for zi > 1.25, and we have no confirmed lens redshifts in this range. Spec- 
troscopy with HST can search for metal lines in the UV, but the integra- 
tion times tend to be prohibitively long unless the quasar images are very 
bright. Thus, while absorption lines either associated with the lens galaxy 
or likely to be associated with the lens galaxy are occasionally found (e.g. 
SDSS1650 +4251 , Morgan, Snyder and Reens l2TTt)3l or HE1104-1805, Lid- 
man et al. I2()0()|l . there have been no systematic studies of metal absorption 
in gravitational lenses. Nonetheless, some very bright quasar lenses are fa- 
vored targets for very high dispersion studies of they Lya forest, particularly 
the four-image lens B1422+231 and the three image lens APM08279+5255, 
because the lens magnification makes these systems anomalously bright for 
quasars at z s > 3. 

B.10 Extended Sources and Quasar Host Galaxies 

As we saw in Figs. IB.3I IB .41 and IB.8I we frequently see lensed emission 
from extended components of the source. These arcs and rings are important 
because they can supply the extra constraints needed to determine the ra- 
dial mass distribution that we lack in a simple two-image of four-image lens 
( 3B.4.1|I . The magnification produced by gravitational lensing also allows us 
to study far fainter quasar host galaxies than is otherwise possible. Compar- 
isons of the luminosities and colors of high and low redshift host galaxies and 
the relative luminosities of the host and the quasar are important for under- 
standing the growth of supermassive black holes and their relationships with 
their parent halos. 

Modeling extended emission is more difficult than modeling point sources 
largely because of the complications introduced by the finite resolution of the 
observations. In this section we first discuss a simple theory of Einstein ring 
images, then some methods for modeling extended emission, and finally some 
results about the mass distributions of lenses and the properties of quasar 
host galaxies. All models of extended lenses sources start from the fact that 
lensing preserves the surface brightness of the source - what we perceive as 
magnification is only an artifact of the finite resolution of our observations. 
This can be modified by absorption in the ISM of the lens galaxy (e.g. see, 
Koopmans et al. l2003J) . but we will neglect this complication in what follows. 
We start with a simple analytic model for the formation of Einstein rings, 
then discuss numerical reconstructions of lensed sources and their ability 
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Fig. B.70. An illustration of ring formation by an SIE lens. An ellipsoidal source 
(left gray-scale) is lensed into an Einstein ring (right gray-scale). The source plane 
is magnified by a factor of 2.5 relative to the image plane. The tangential caustic 
(astroid on left) and critical line (right) are superposed. The Einstein ring curve is 
found by looking for the peak brightness along radial spokes in the image plane. 
For example, the spoke in the illustration defines point A on the ring curve. The 
long line segment on the right is the projection of the spoke onto the source plane. 
Point A corresponds to point A' on the source plane where the projected spoke 
is tangential to the intensity contours of the source. The ring in the image plane 
projects into the four-lobed pattern on the source plane. Intensity maxima along 
the ring correspond to the center of the source. Intensity minima along the ring 
occur where the ring crosses the critical curve (e.g. point B). The corresponding 
points on the source plane (e.g. B') are where the astroid caustic is tangential to 
the intensity contours. 

to constrain mass distributions, and end with a survey of the properties of 
quasar host galaxies. 

B.10.1 An Analytic Model for Einstein Rings 

Most of the lensed extended sources we see are dominated by an Einstein 
ring - this occurs when the size of the source is comparable to the size of 
the astroid caustic associated with producing four-image lenses. When the 
Einstein ring is fairly thin, there is a simple analytic model for the formation 
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Fig. B.71. The Einstein ring curves in PG1115+080 (top) and B1938+666 (bot- 
tom). The black squares mark the lensed quasar or compact radio sources. The 
light black lines show the ring curve and its uncertainties. The black triangles show 
the intensity minima along the ring curve (but not their uncertainties). The best fit 
model ring curve is shown by the dashed curve, and the heavy solid curve shows the 
critical line of the best fit model. The model was not constrained to fit the critical 
line crossings. 
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of Einstein rings (Kochanek, Keeton & McLeod 2001}. The important point 
to understand is that the ring is a pattern rather than a simple combination 
of multiple images. Mathematically, what we identify as the ring is the peak 
of the surface brightness as a function of angle around the lens galaxy. We 
can identify the peak by finding the maximum intensity A(%) along radial 
spokes in the image plane, 0(A) — 6q + A(cosx,sinx). At a given azimuth 
X we find the extremum of the surface brightness of the image /d(0) along 
each spoke, and these lie at the solutions of 

df) 

= d x f D (6)=VefD(e)-—. (B.134) 

The next step is to translate the criterion for the ring location onto the source 
plane. In real images, the observed image /d(#) is related to the actual surface 
density fi(9) by a convolution with the beam (PSF), = B * fi(0), but 

for the moment we will assume we are dealing with a true surface brightness 
map. Under this assumption /d(0) = fi{6) = fs(fl) because of surface 
brightness conservation. When we change variables, the criterion for the peak 
brightness becomes 

df) 

= VpfsiP) • M- 1 ■ — (B.135) 

where the inverse magnification tensor M^ 1 = d6/d(3 is introduced by the 
variable transformation. Geometrically we must find the point where the 
tangent vector of the curve, M _1 d6 / 'd\ is perpendicular to the local gradient 
of the surface brightness V/3/s(/3). These steps are illustrated in Fig. IB.70I 
This result is true in general but not very useful. We next assume that 
the source has ellipsoidal surface brightness contours, /s(m 2 ), with to 2 = 
A/3 ■ S ■ A/3 where A/3 = (3 — /3rj is the distance from the center of the source, 
/3o, and the matrix S is defined by the axis ratio q s = 1 — e s < 1 and po- 
sition angle Xs of the source. We must assume that the surface brightness 
declines monotonically, df s (m 2 )/dm 2 < 0, but require no additional assump- 
tions about the actual profile. With these assumptions the Einstein ring curve 
is simply the solution of 

df) 

= A/3-S- f i- 1 -—. (B.136) 

OLA 

The ring curve traces out a four (two) lobed cloverleaf pattern when projected 
on the source plane if there are four (two) images of the center of the source 
(see Fig. IB . 70|) . These lobes touch the tangential caustic at their maximum 
ellipsoidal distance from the source center, and these cyclic variations in the 
ellipsoidal radius produce the brightness variations seen around the ring. 
The surface brightness along the ring is defined by //(A(x),x) for a spoke 
at azimuth x an d distance A(x) found by solving Ean. IB. 1351 The extrema 
in the surface brightness around the ring are located at the points where 
®xfi(^(x)>x) = 0) which occurs only at extrema of the surface brightness 
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of the source (the center of the source, Af3 = in the ellipsoidal model), or 
when the ring crosses a critical line of the lens and the magnification tensor 
is singular (|M| _1 = = 0) for the minima. These are general results that 
do not depend on the assumption of ellipsoidal symmetry. 

For an SIE lens in an external shear field we can derive some simple 
properties of Einstein rings to lowest order in the various axis ratios. Let the 
SIE have critical radius b, axis ratio qi = 1 — e; and put its major axis along 
6\. Let the external shear have amplitude 7 and orientation 9 1 . We let the 
source be an ellipsoid with axis ratio q s — 1 — e s and a major axis angle Xs 
located at position (/3cosxoj /? sm Xo) from the lens center. The tangential 
critical line of the lens lies at radius 

r C rit/b= l + |-cos2x-7 cos2 (x-X7) (B.137) 

while the Einstein ring lies at 

— = 1 + -cos(x- Xo) - -cos2x + 7Cos2(x- X 7 )- (B.138) 

At this order, the Einstein ring is centered on the source position rather than 
the lens position. The orientation of the ring is generally perpendicular to 
that of the critical curve, although it need not be exactly so when the SIE 
and the shear are misaligned due to the differing coefficients of the shear 
and ellipticity terms in the two expressions. These results lead to a false 
impression that the results do not depend on the shape of the source. In 
making the expansion we assumed that all the terms were of the same order 
(J5jb ~ 7 ~ e; ~ e s ), but we are really doing an expansion in the ellipticity of 
the potential of the lens ~ e;/3 rather than the ellipticity of the density 
distribution of the lens, so second order terms in the shape of the source 
are as important as first order terms in the ellipticity of the potential. For 
example in a circular lens with no shear (et = 0, 7 = 0) the ring is located at 

r_E_ = 1 | P (2-e s ) cos(x - Xo) + ^ cos(2xs - X - Xo) (B 139) 

b b 2-e s + e s cos2(x s -x) 

which has only odd terms in its multipole expansion and converges slowly 
for flattened sources. In general, the ring shape is a weak function of the 
source shape only if the potential is nearly round and the source is almost 
centered on the lens. The structure of the lens potential dominates the even 
multipoles of the ring shape, while the structure of the source dominates the 
odd multipoles. 

In fact, the shape of the ring can be used to simply "read off" the am- 
plitudes of the higher order multipoles of the lens potential. This is nicely 
illustrated by an isothermal potential with arbitrary angular structure, = 
rbF( X ) with (F( X )) = 1 (see Zhao & Pronk l2lMl Witt et al. 120001 Kochanek 
et al. 120011 Evans & Witt 12001(1 in the absence of any shear. The tangential 
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critical line of the lens is 

r -^=F( X )+F"( X ). (B.140) 

If e x and eg are radial and tangential unit vectors relative to the lens center 
and Po is the distance of the source from the lens center, then the Einstein 
ring curve is 

? = F{X) + F'( X )^^ + _ F[x) + p Q . (B .141) 

o ee ■ o ■ ee ee ■ b ■ eg 

with the limit showing the result for a circular source. 

Thus, by analyzing the multipole structure of the ring curve one can de- 
duce the multipole structure of the potential. While this has not been done 
non-parametrically, the ability of standard ellipsoidal models to reproduce 
ring curves strongly suggests that higher order multipoles cannot be signif- 
icantly different from the ellipsoidal scalings. Fig. IB. 7 II shows two examples 
of fits to the ring curves in PG1 115+080 and B1938+666 using SIE plus ex- 
ternal shear lens models. The major systematic problem with fitting the real 
data are that bright quasar images must frequently be subtracted from the 
image before the ring curve can be extracted, and this can lead to artifacts 
like the wiggle in the curve between the bright A1/A2 images of PG1115+080. 
Other than that, the accuracy with which the ellipsoidal (plus shear) mod- 
els reproduce the curves is consistent with the uncertainties. In both cases 
the host galaxy is relatively flat {q s = 0.58 ± 0.02 for PG1115+080 and 
0.62 ± 0.14 for B1938+666). The flatness of the host explains the "boxiness" 
of the PG1115+080 ring, while the B1938+666 host galaxy shape is poorly 
constrained because the center of the host is very close to the center of the 
lens galaxy so the shape of the ring is insensitive to the shape of the source. 
Unless the source is significantly offset from the center of the lens, as we 
might see for the host galaxy of an asymmetric two-image lens, it does not 
constrain the radial density profile of the lens very well - after considerable 
algebraic effort you can show that the dependence on the radial structure 
scales as |A3| 4 . It can, however, help considerably in this circumstance be- 
cause it eliminates the angular degrees of freedom in the potential that make 
it impossible for two-image lenses to constrain the radial density profile at 
all. 

B.10.2 Numerical Models of Extended Lensed Sources 

Obviously the ring curve and its extrema are an abstraction of the real struc- 
ture of the lensed source. Complete modeling of extended sources requires a 
real model for the surface brightness of the source. In many cases it is suffi- 
cient to simply use a parameterized model for the source, but in other cases 
it is not. The basic idea in any non-parametric method is that there is an 
optimal estimate of the source structure for any given lens model. This is 
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J2(l±-A±\ (B.142) 



most easily seen if we ignore the smearing of the image by the beam (PSF) 
and assume that our image is a surface brightness map. Since surface bright- 
ness is conserved by lensing, fi(6) = fs((3). For any lens model with pa- 
rameters p, the lens equations define the source position (3(0, p) associated 
with each image position. If we had only single images of each source point, 
this would be useless for modeling lenses. However, in a multiply imaged 
region, more than one point on the image plane is mapped to the same 
point on the source plane. In a correct lens model, all image plane points 
mapped to the same source plane position should have the same surface 
brightness, while in an incorrect model, points with differing surface bright- 
nesses will be mapped to the same source point. This provided the basis 
for the first non-parametric method, sometimes known as the "Ring Cycle" 
method (Kochanek et al. 119891 Wallington, Kochanek & Koo ll995T) . Suppose 
source plane pixel j is associated with image plane pixels i — 1 • • • rij with 
surface brightness f and uncertainties Ui. The goodness of fit for this source 
pixel is 

n. „■ n 

. fi fs 

Xj = 

i=l 

where f s will be our estimate of the surface brightness on the source plane. 
For each lens model we compute % 2 (p) = Y2Xj an d then optimize the lens 
parameters to minimize the surface brightness mismatches. 

The problem with this algorithm is that we never have images that are 
true surface brightness maps - they are always the surface brightness map 
convolved with some beam (PSF). We can generalize the simple algorithm 
into a set of linear equations. Although the source and lens plane are two- 
dimensional, the description is simplified if we simply treat them as a vector 
f s of source plane surface brightness and a vector // of image plane flux 
densities (i.e. including any convolution with the beam). The two images are 
related by a linear operator A(p) that depends on the parameters of the 
current lens model and the PSF. In the absence of a lens, A is simply the 
real-space (PSF) convolution operator. In either case, the fit statistic 

x2 = \ fl -yf s ? (B 143) 

(with uniform uncertainties here, but this is easily generalized) must first be 
solved to determine the optimal source structure for a given lens model and 
then minimized as a function of the lens model. The optimal source structure 
dx 2 /df s = leads to the equation that fs = A^ 1 (p)fj. The problem, which 
is the same as we discussed for non-parametric mass models in ijB.4.71 is that 
a sufficiently general source model when combined with a PSF will lead to 
a singular matrix for which A(p)~ 1 is ill-defined - physically, there will be 
wildly oscillating source models for which it is possible to obtain x 2 (p) = 0- 

Three approaches have been used to solve the problem. The first is LensClean 
(Kochanek & Narayan IT551 Ellithorpe, Kochanek & Hewitt 19!)(i Wuck- 
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nitz I2l)l)ff) . which is based on the Clean algorithm of radio astronomy. Like 
the normal Clean algorithm, LensClean is a non-linear method using a prior 
that radio sources can be decomposed into point sources for determining the 
structure of the source. The second is LensMEM (Wallington, Kochanek & 
Narayan 1996), which is based on the Maximum Entropy Method (MEM) for 
image processing. The determination of the source structure is stabilized by 
minimizing x 2 + A / d 2 {3fsln(fs I /o) while adjusting the Lagrange multiplier 
A such that at the minimum x 2 ~ Ndof where N^of is the number of degrees 
of freedom in the model. Like Clean/LensClean, MEM/LensMEM is a non- 
linear algorithm in which solutions must be solved iteratively. Both LensClean 
and LensMEM can be designed to produce only positive-definite sources. The 
third approach is linear rcgularization where the source structure is stabilized 
by minimizing x 2 + \f s' H ■ f s (Warren & Dve l2003l Koopmans et al. 120030 . 
The simplest choice for the matrix H is the identity matrix, in which case 
the added criterion is to minimize the sum of the squares of the source flux. 
More complicated choices for H will minimize the gradients or curvature of 
the source flux. The advantage of this scheme is that the solution is simply 
a linear algebra problem with (A T (p)A(p) + \H)fs = A T (p)fj. 

In all three of these methods there are two basic systematic issues which 
need to be addressed. First, all the methods have some sort of adjustable 
parameter - the Lagrange multiplier A in LensMEM or the linear regular- 
ization methods and the stopping criterion in the LensClean method. As the 
lens model changes, the estimates of the parameter errors will be biased if 
the treatment of the multiplier or the stopping criterion varies with changes 
in the lens model in some poorly understood manner. Second, it is difficult 
to work out the accounting for the number of degrees of freedom associated 
with the model for the source when determining the significance of differences 
between lens models. Both of these problems are particularly severe when 
comparing models where the size of the multiply imaged region depends on 
the lens model. Since only multiply imaged regions supply any constraints 
on the model, one way to improve the goodness of fit is simply to shrink the 
multiply imaged region so that there are fewer constraints. Since changes in 
the radial mass distribution have the biggest effect on the multiply imaged 
region, this makes estimates of the radial mass distribution particularly sen- 
sitive to controlling these biases. It is fair to say that all these algorithms 
lack a completely satisfactory understanding of this problem. For radio data 
there are added complications arising from the nature of interferometric ob- 
servations, which mean that good statistical models must work with the raw 
visibility data rather than the final images (see Ellithorpe et al.[1996 ). 

These methods, including the effects of the PSF, have been applied to de- 
termining the mass distributions in 0047-2808 (Dye & Warren lMiifll . B0218+357 
(Wucknitz, Biggs & Browne IMjljl . MG1 131+0456 (Chen, Kochanek & He- 
witt 113951 and MG1654+134 fKochanek li995af l. We illustrate them with the 
Dye & Warren l|2TIU3}l results for 0047-2808 in Fig. lB~72l The mass distribu- 
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Fig. B.72. Models of 0047-2808 from Dye & Warren |20O- The right panel shows 
the lensed image of the quasar host galaxy after the foreground lens has been sub- 
tracted. The middle panel shows the reconstructed source and its position relative 
to the tangential (astroid) caustic. The left panel shows the resulting constraints 
on the central exponent of the dark matter halo (p oc r ') and the stellar mass- 
to-light ratio of the lens galaxy. The dashed contours show the constraints for the 
same model using the central velocity dispersion measurement from Koopmans & 
Treu lEDPSt . 



tion consists of the lens galaxy and a cuspy dark matter halo, where Fig. IB. 721 
shows the final constraints on the mass-to-light ratio of the stars in the lens 
galaxy and the exponent of the central dark matter density cusp {p oc r~ 7 ). 
The allowed parameter region closely resembles earlier results using either 
statistical constraints (Fig. lB.32|l or stellar dynamics (Tig. lB.33|) . In fact, the 
results using the stellar dynamical constraint from Koopmans & Treu ( 2003 ) 
are superposed on the constraints from the host in Fig. IB. 721 with the host 
providing a tighter constraint on the mass distribution than the central ve- 
locity dispersion. The one problem with all these models is that they have too 
few degrees of freedom in their mass distributions by the standards we dis- 
cussed in ijB.4.61 In particular, we know that four-image lenses require both 
an elliptical lens and an external tidal shear in order to obtain a good fit to 
the data (e.g. Keeton, Kochanek & Seljak 1997), while none of these models 
for the extended sources allows for multiple sources of the angular structure 
in the potential. In fact, the lack of an external shear probably drives the 
need for dark matter in the 0047-2808 models. Without dark matter, the 
decay of the stellar quadrupole and the low surface density at the Einstein 
ring means that the models generate too small a quadrupole moment to fit 
the data in the absence of a halo. The dark matter solves the problem both 
through its own ellipticity and the reduction in the necessary shear with a 
higher surface density near the ring (recall that 7 oc 1 — (ft)). Again see the 
need for a greater focus on the angular structure of the potential. 
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Fig. B.73. The host galaxy in PG1115+080. The top left panel shows the 1-orbit 
NICMOS image from Impey et al. (jTJSSJ. The top right panel shows the lensed 
host galaxy after subtracting the quasar images and the lens galaxy, The lower left 
panel shows the residuals after subtracting the host as well. For comparison, the 
lower right panel shows what an image of an unlensed PG1115+080 quasar and 
host would look like in the same integration time and on the same scales. The host 
galaxy is an H= 20.8 mag late- type galaxy (Sersic index n = 1.4) with a scale length 
of R e = 1.5/i _1 kpc. The demagnified magnitude of the quasar is H= 19.0 mag. 
The axis ratio of the source, q s = 0.65 ± 0.04 is consistent with the estimate of 
q s = 0.58 ±0.02 from the simpler ring curve analysis ( ilB.10.11 Fig. lB~7ll Kochanek 
Keeton & McLeod EUtJTt . 



B.10.3 Lensed Quasar Host Galaxies 

One advantage of studying lensed quasars is that the lens magnification enor- 
mously enhances the visibility of the quasar host. A typical HST PSF makes 
the image of a point source have a mean surface brightness that declines as 
i?~ 3 with distance R from the quasar. Compared to an unlensed quasar, the 
host galaxy of a lensed quasar is stretched along the Einstein ring leading 
to an improvement in the contrast between the host in the quasar of for 
an image magnified by jj, - you gain /i 3 by stretching the host away from 
the quasar and lose \i because the quasar is magnified. Perpendicular to the 
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Einstein ring, the contrast becomes a factor of \x worse than for an unlensed 
quasar. Since the alignment of the magnification tensor relative to the host 
changes with each image, the segment of the host where contrast is lost will 
correspond to a segment where it is gained for another image leading to a 
net gain for almost all parts of the source when you consider all the images. 
The distortions produced by lensing also mean the host structure is more 
easily distinguished from the PSF. In a few cases, like SDSS0924+0219 in 
Fig. IB. 581 microlensing or substructure may provide a natural coronograph 
that supresses the flux from the quasar but not that from the host. Despite 
naive expectations (and TAC comments), the distortions have little conse- 
quence for understanding the structure of the host even though a lens model 
is required to produce a photometric model of the host. 

The only extensive survey of lensed quasar hosts is that of Peng {2004 ) . 
Fig. lB~73l shows the example of PG1115+080, az s = 1.72 radio-quiet (RQQ) 
quasar. The Einstein ring image is easily visible even in a short, one-orbit 
exposure. For comparison, we also took the final model for the quasar and 
the its host and produced the image that would be obtained in the same time 
if we observed the quasar in the absence of lensing. It is quite difficult to see 
the host, and this problem will carry through in any numerical analysis. 

At low redshifts (z < 1), quasar host galaxies tend to be massive early- 
type galaxies (e.g. McLure et al. 115531 Dunlop et al. 1277113)) . Over 80% of 
quasars brighter than My < —23.5 mag are in early-type galaxies with L ^> 
2L* and effective radii of R e ~ 10 kpc for z <S 0.5. Radio quiet quasars (RQQ) 
tend to be in slightly lower luminosity hosts than radio loud quasars (RLQ) 
but only by factors of ~ 2 at redshift unity. Far fewer unlensed host galaxies 
have been detected above redshift unity (e.g. Kukula et al. 2001 , Ridgway et 
al. l2001f) with the surprising result that the host galaxies are 2-3 mag brighter 
than the typical host galaxy at low redshift and corresponded to ~ 4L» 
galaxies. Given that the low redshift hosts were already very massive galaxies, 
it was expected that higher redshift hosts would have lower masses because 
they were still in the process of being assembled and forming stars (e.g. 
Kauffmann & Haehnclt 200JJJ). One simple explanation was that by selecting 
from bright radio sources, these samples picked quasars with more massive 
black holes as the redshift increased, creating a bias in favor of more massive 
hosts. The key to checking for such a bias is to be able to detect far less 
luminous hosts, and the improved surface brightness contrast provided by 
lensing the host galaxies provides the means. 

Fig. IB. 741 shows the observed H-band magnitudes of the lensed hosts as 
compared to low redshift host galaxies and other studies of high redshift 
host galaxies. Although 30% of the lensed quasars are radio-loud, they have 
luminosities similar to the lensed (or unlensed) radio-quiet hosts. There are 
no hosts as bright as the Kukula et al. (2001) radio-loud quasar hosts. Once 
the luminosities of the quasar and the host galaxy are measured we can 
compare them to the theoretical expectations (Fig. IB.75|) . While the models 
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Fig. B.74. Observed H-band magnitudes of quasar host galaxies. The solid (open) 
circles are secure (more questionable) hosts detected in the CASTLES survey of 
lensed hosts. The low redshift points are from McLeod & McLeod (12001ft . All the 
Ridgway et al. 11200111 systems are radio quiet. For comparison, we superpose the 
evolutionary tracks for a non-evolving E/SO galaxy (solid curve), an evolving E/SO 
galaxy which stars forming stars at zj = 5 with a 1 Gyr exponentially decaying 
star formation rate (long dashed line) and a star forming Sb/c model (short dashed 
line). The evolution models are matched to the luminosity of an L* early- type 
galaxy at redshift zero. The CASTLES observations can reliably detect hosts about 
4 magnitudes fainter than the quasar. From Peng (15tMl . 

agree with the data at low redshift, they are nearly disjoint by z <~ 3 in the 
sense that the observed quasars and hosts are significantly more luminous 
than predicted. The same holds for the Kukula et al. H2l)l)lfl and Ridgway 
et al. l|2(JUlfl samples, suggesting that black holes masses grow more rapidly 
than predicted by the theoretical models or that accretion efficiencies were 
higher in the past. Vestergaard <|2(J(J4[1 makes a similar argument based on 
estimates of black hole masses from emission line widths. 
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Fig. B.75. A comparison of the estimated rest frame absolute magnitudes of 
the quasars and hosts as compared to the theoretical models for the evolution 
of galaxies and the growth of black holes as a function of redshift by Kauffmann 
& Haehnelt H20()()p . The low redshift quasars from McLeod & McLeod (I2()()l|l oc- 
cupy the triangle in the upper left panel. At intermediate redshift the lensed host 
galaxies occupy a region similar to the models, but the two distributions are nearly 
disjoint by z ~ 3. Both the hosts and the quasars are significantly more luminous 
than predicted. The horizontal line marks the luminosity of an L* galaxy at z — 0. 
From Peng (l5U0"lt . 

B.ll Does Strong Lensing Have A Future? 

Well, you can hardly expect an answer of "No!" at this point, can you? Since 
we have just spent nearly 170 pages on the astrophysical uses of lenses, there 
is no point in reviewing all the results again here. Instead I suggest some 
goals for the future. 

Our first goal is to expand the sample of lenses from ~ 100 to ~ 1000. 
While 80 lenses seems like a great many compared to even a few years ago, it is 
still too few to pursue many interesting questions. The problem worsens if the 
analysis must be limited to lenses meeting other criteria (radio lenses, lenses 
found in a well-defined survey, lenses outside the cores of clusters ■ • •) or if the 
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sample must be subdivided into bins (redshift, separation, luminosity •••). 
For example, one of the most interesting applications of lenses will be to map 
out the halo mass function. This is difficult to do with any other approach 
because no other selection method works homogeneously on dark low-mass 
halos, galaxies of different types, groups and clusters. Unlike any other sample 
in astronomy, gravitational lenses are selected based on mass rather than 
luminosity, so the same search method works for all halos - the separation 
distribution of lenses is a direct mapping of the halo mass function. It is not 
a trivial mapping because the structure of halos changes with mass, but the 
systematica are far better than those of any other approach. The fact that 
lenses are mass-selected also gives them an enormous advantage in studying 
the evolution of galaxies with redshift over optically-selected samples where it 
will be virtually impossible to select galaxies in the same manner at both low 
and high redshift. There is no shortage of detectable lenses in the universe - it 
is simply a question of imaging enough of the sky at high angular resolution. 
The upgraded VLA and Merlin radio arrays are the most promising tools for 
this objective. 

Our second goal is to systematically monitor the variability of as many 
lenses as possible. Time delays, if measured in large numbers and measured 
accurately, can resolve most of the remaining issues about the mass distri- 
butions of lenses. This is true even if you regard the Hq as unmeasured or 
uncertain - the Hubble constant is the same number for all lenses, so as the 
number of time delay systems increases, the contribution of the actual value 
of the Hubble constant to constraining the mass distribution diminishes. At 
the present time, we are certain that the typical early-type galaxy has a 
substantial dark matter halo, but we are uncertain how it merges with the 
luminous galaxy. Steady monitoring of microlensing of the source quasars by 
the stars in the lens galaxy will also help to resolve this problem because 
the patterns of the microlensing variability constrain both the stellar surface 
density near the lensed images and the total density (Part 4, Schechter & 
Wambsganss 2002} . The constraints from time delays and microlensing will 
be complemented by the continued measurement of central velocity disper- 
sions. 

Our third goal should be to obtain ultra-deep, high resolution radio maps 
of the lenses to search for central images in order to measure the central 
surface densities of galaxies and to search for supermassive black holes. Kee- 
ton l|2()()3a[) showed that the dynamic ranges of the existing radio maps of 
lenses are 1-2 orders of magnitude too small to routinely detect central images 
given the expected central surface densities of galaxies. Only very asymmet- 
ric doubles like PMN1632-0033, where Winn et al. l)3tM|> have detected a 
central image, are likely to show central images with the present data. Once 
we reach the sensitivity needed to detect central images, we will also either 
find central black holes or set strict limits on their existence (Mao, Witt & 
Koopmans I2l)l)lf) . This is the only approach that can directly detect even 
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quiescent black holes and determine their masses at cosmological distances. 
The existing limits could be considerably improved simply by co-adding the 
existing radio maps either for individual lenses or even for multiple lenses in 
order to obtain statistical limits. 

Our fourth goal should be to unambiguously identify a "dark" satellite of 
a lens galaxy. For starters we need to conduct complete statistical analyses 
of lens galaxy satellites in general, by determining their mass functions and 
radial distributions. As part of such an analysis we can obtain upper bounds 
on the number of dark satellites. Then, with luck, we will find an example 
of a lens that requires a satellite at a specific location for which there is no 
optical counterpart. This may be too conservative a condition. For example, 
Peng (|2"UU4"|) argues that much of the flux of Object X in MG0414+0534 
(Fig. IB. 6(1 is actually coming from lensed images of the quasar host galaxy 
rather than the satellite. 

Finally, lens magnification already means that it is far easier to do pho- 
tometry of a lensed quasar host galaxy than an unlensed galaxy. The next 
frontier is to measure the kinematics of cosmologically distant host galaxies. 
This might already be doable for the host galaxy of Q0957+561 at z s = 1.41, 
but will generally require either JWST or the next generation of ground 
based telescopes. With larger lens samples we may also find more cases like 
SDSS0924+0219 where gravitational lensing provides a natural coronograph 
for the quasar. 
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